You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/09/10 22:06:17 UTC

[GitHub] [commons-collections] Claudenw opened a new pull request, #335: Collections 384 bit count producer operation is not clearly defined

Claudenw opened a new pull request, #335:
URL: https://github.com/apache/commons-collections/pull/335

   fix for COLLECTIONS-384
   
   Document bit count producer operation in the BitCountProducer.java file.
   
   Add unit tests for all BitCountProducers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-384][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r998704193


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,8 +98,36 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    /**
+     * Creates an array of expected indices.
+     * @return an array of expected indices.
+     */
+    protected abstract int[] getExpectedIndices();
+
+    @Test
+    public final void testAsIndexArrayValues() {
+        List<Integer> lst = new ArrayList<>();
+        Arrays.stream(createProducer().asIndexArray()).boxed().forEach( lst::add );
+        for (int i : getExpectedIndices()) {
+            assertTrue( lst.contains(i), "Missing "+i );
+        }
+    }
+
     @Test
     public final void testForEachIndex() {
+        //IndexProducer producer = createProducer();

Review Comment:
   fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromSparseBloomFilterTest.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromSparseBloomFilterTest extends AbstractBitCountProducerTest {
+
+    protected Shape shape = Shape.fromKM(17, 72);
+
+    @Override
+    protected BitCountProducer createProducer() {
+        Hasher hasher = new IncrementingHasher(0, 1);
+        BloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(hasher);
+        return BitCountProducer.from(bf);
+

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1002407907


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();
+        createProducer().forEachCount((i, j) -> list.add(i));
+        int[] actual = list.toArray();
+        if ((flags & FOR_EACH_COUNT_ORDERED) != 0) {
+            int[] expected = Arrays.stream(actual).sorted().toArray();
+            Assertions.assertArrayEquals(expected, actual);
+        }
+        if ((flags & FOR_EACH_COUNT_DISTINCT) != 0) {
+            long count = Arrays.stream(actual).distinct().count();
+            Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedForEach();

Review Comment:
   Replaced javadoc text with this code..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008717726


##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromAbsoluteUniqueHasherCollectionTest2.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class IndexProducerFromAbsoluteUniqueHasherCollectionTest2 extends AbstractIndexProducerTest {

Review Comment:
   fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromAbsoluteUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromAbsoluteUniqueHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),

Review Comment:
   Changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008716958


##########
src/test/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasherTest.java:
##########
@@ -35,6 +36,12 @@ protected Hasher createEmptyHasher() {
         return NullHasher.INSTANCE;
     }
 
+

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1288752794

   I discovered issues with the HasherCollection and Hasher generated IndexProducer and BitCountProducers created from those IndexProducers.  Additional tests and corrections included in this latest push.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1288857212

   Last commit was on Oct 21st. Is this correct or do you need to push again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r998693769


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,8 +98,36 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    /**
+     * Creates an array of expected indices.
+     * @return an array of expected indices.
+     */
+    protected abstract int[] getExpectedIndices();

Review Comment:
   fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -16,35 +16,29 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
 import org.apache.commons.collections4.bloomfilter.BitCountProducer.BitCountConsumer;
 import org.junit.jupiter.api.Test;
 
 public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {
-
+    /**
+     * A testing BitCountConsumer that always returns true.
+     */
+    private static final BitCountConsumer TRUE_CONSUMER = (i, j) -> true;
     /**
      * A testing BitCountConsumer that always returns false.
      */
-    public static BitCountConsumer FALSE_CONSUMER = new BitCountConsumer() {
-
-        @Override
-        public boolean test(int index, int count) {
-            return false;
-        }
-    };
+    private static final BitCountConsumer FALSE_CONSUMER = (i, j) -> false;
 
     /**
-     * A testing BitCountConsumer that always returns true.
+     * Creates an array of integer pairs comprising the index and the expected count for the index.
+     * @return an array of integer pairs comprising the index and the expected count for the index.
      */
-    public static BitCountConsumer TRUE_CONSUMER = new BitCountConsumer() {
-
-        @Override
-        public boolean test(int index, int count) {
-            return true;
-        }
-    };
+    protected abstract int[][] getExpectedBitCount();

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1295877575

   In all cases where `AS_ARRAY_ORDERED` is set `FOR_EACH_ORDERED` is also set, and the `FOR_EACH_COUNT_ORDERED` is therefore set as well.
   
   In all cases where `AS_ARRAY_DISTINCT` is set `FOR_EACH_DISTINCT` is also set, and the `FOR_EACH_COUNT_DISTINCT` is therefore set as well.
   
   I propose collapsing this into 2 statics:
   `INDICES_ORDERED` : replaces `AS_ARRAY_ORDERED`, `FOR_EACH_ORDERED`, and `FOR_EACH_COUNT_ORDERED`.
   `INDICES_DISTINCT`: replaces `AS_ARRAY_DISTINCT`, `FOR_EACH_DISTINCT` and `FOR_EACH_COUNT_DISTINCT`.
   
   By "replaces" I mean remove all instances of the replaced constants (not make them equivalent).  
   Add documentation that states the testing assumes the ORDERED and DISTINCT properties carry across `asArray()`, `forEachIndex()` and `forEachCount()`.
   
   Also, since the `AbstractBitCountProducerTest ` extends `AbstractIndexProducerTest` we can remove all `IndexProducerFrom_X_Test` implementations where there is a `BitCountProducerFrom_X_Test`
   
   @aherbert unless you have an objection, I'll proceed with these changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1296319411

   Hi @Claudenw, consolidating all the behaviour flags is a good idea. It will avoid future tests failing to set them correctly. I'll wait until you have done that for a review.
   
   Note: I have reservations about the HasherCollection and the requirement for an `absoluteUniqueIndices` method. In summary:
   
   1. I think it should be resolved after this PR
   2. Since the HasherCollection violates the IndexProducer contract by outputting more than k indices, we may need to revise the contract and javadoc as being able to output up to a multiple of k indices for a Shape. This would be helped by a size-type method in the IndexProducer interface to state what count of k will be output.
   3. The method `absoluteUniqueIndices` may be better served by a HasherSet, or some other naming variant, that ensures all indices output by the IndexProducer are unique.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997172420


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +48,37 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
-    }
 
     @Test
-    public final void testForEachCount() {
-
+    public final void testForEachCountResults() {
         assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
         assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
-        }
+        assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
+        assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    }
+
+    protected abstract int[][] getExpectedBitCount();

Review Comment:
   moved and code added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1003576587


##########
src/main/java/org/apache/commons/collections4/bloomfilter/Hasher.java:
##########
@@ -49,7 +49,8 @@ public interface Hasher {
      * Creates an IndexProducer of unique indices for this hasher based on the Shape.
      *
      * <p>This is like the `indices(Shape)` method except that it adds the guarantee that no
-     * duplicate values will be returned</p>
+     * duplicate values will be returned.  The indices produced are equivalent those returned

Review Comment:
   `equivalent to those`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of the contained

Review Comment:
   delete `of` before comprising



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of the contained
+     * hashers.
+     *
+     * <p>This method may return duplicates if the collection of unique values from each of the contained
+     * hashers contain duplicates.  This is equivalent to creating Bloom filters for each contained hasher
+     * and returning concatenated IndexProducer from each.</p>
+     *
+     * <p>A BitCountProducer generated from this IndexProducer is equivalent to a BitCountProducer from a
+     * counting Bloom filter that was constructed from the contained hashers.<p>

Review Comment:
   IIUC this is only true if the counting Bloom filter was constructed from unique indices from the contained hashers. So perhaps change to: `constructed from the contained hashers unique indices.`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromAbsoluteUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromAbsoluteUniqueHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),

Review Comment:
   This tests a saturation of the number of bits. We should really have a test for non saturation, no collisions just to make sure that works as well.
   



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -141,29 +172,16 @@ public boolean forEachIndex(IntPredicate consumer) {
 
         @Override
         public int[] asIndexArray() {
-            List<int[]> lst = new ArrayList<>();
-            int[] count = new int[1];
-            /*
-             * This method needs to return duplicate indices
-             */
-            for (Hasher hasher : hashers) {
-                int[] ary = hasher.indices(shape).asIndexArray();
-                lst.add(ary);
-                count[0] += ary.length;
-            }
-            if (lst.isEmpty()) {
-                return new int[0];
-            }
-            if (lst.size() == 1) {
-                return lst.get(0);
-            }
-            int[] result = new int[count[0]];
-            int offset = 0;
-            for (int[] ary : lst) {
-                System.arraycopy(ary, 0, result, offset, ary.length);
-                offset += ary.length;
-            }
-            return result;
+            int[] result = new int[shape.getNumberOfHashFunctions()*hashers.size()];

Review Comment:
   This is a much better implementation.
   
   Can you add whitespace around the multiply ` * `.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromAbsoluteUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromAbsoluteUniqueHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),
+                new IncrementingHasher(2, 2)).absoluteUniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new HasherCollection().uniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered
+        return 0;

Review Comment:
   Should this have flags for distinct?



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromUniqueHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),
+                new IncrementingHasher(2, 2)).uniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(NullHasher.INSTANCE.uniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered

Review Comment:
   `// HasherCollection allows ...`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasherTest.java:
##########
@@ -35,6 +36,12 @@ protected Hasher createEmptyHasher() {
         return NullHasher.INSTANCE;
     }
 
+

Review Comment:
   Remove extra line



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of the contained
+     * hashers.
+     *
+     * <p>This method may return duplicates if the collection of unique values from each of the contained
+     * hashers contain duplicates.  This is equivalent to creating Bloom filters for each contained hasher
+     * and returning concatenated IndexProducer from each.</p>

Review Comment:
   returning an IndexProducer with the concatenated output indices from each filter.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromAbsoluteUniqueHasherCollectionTest2.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class IndexProducerFromAbsoluteUniqueHasherCollectionTest2 extends AbstractIndexProducerTest {

Review Comment:
   Remove the 2 from the name suffix



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,8 +98,36 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    /**
+     * Creates an array of expected indices.
+     * @return an array of expected indices.
+     */
+    protected abstract int[] getExpectedIndices();
+
+    @Test
+    public final void testAsIndexArrayValues() {
+        List<Integer> lst = new ArrayList<>();

Review Comment:
   You have not removed the redundant part of this test. The distinctness is tested in `testBehaviourAsIndexArray`.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -106,6 +120,23 @@ public boolean forEachIndex(IntPredicate consumer) {
         };
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices across all the contained
+     * hashers.
+     *
+     * <p>This is equivalent to an IndexProducer created from a Bloom filter that comprises all
+     * the contained hashers.</p>
+     *
+     * @param shape the shape of the desired Bloom filter.
+     * @return the iterator of integers
+     */
+    public IndexProducer absoluteUniqueIndices(final Shape shape) {
+        return consumer -> {
+            Objects.requireNonNull(consumer, "consumer");
+            return uniqueIndices(shape).forEachIndex(IndexFilter.create(shape, consumer));

Review Comment:
   IIUC this can exceed the capacity of the IndexFilter. The uniqueIndices(shape) will output up to k indices for each hasher in the collection (eliminating duplicates). Thus if we have two hashers with shape(k=5, n=1000) we could output 10 indices from the combined unique indices. Then we create an IndexFilter for shape which could be optimised only to check up to k=5 indices.
   
   This throws an index out of bounds exception from IndexFilter.ArrayTracker.
   
   ```Java
   @Test
   public void testAbsoluteUniqueIndices() {
       int[] actual = new HasherCollection(
           new IncrementingHasher(1, 1),
           new IncrementingHasher(10, 1)
       ).absoluteUniqueIndices(Shape.fromKM(5, 1000)).asIndexArray();
       int[] expected = IntStream.concat(
               IntStream.range(1, 1 + 5),
               IntStream.range(10, 10 + 5)
           ).toArray();
       Assertions.assertArrayEquals(expected, actual);
   }
   ```
   
   When I update HasherCollection absoluteUniqueIndices to use a fake shape that forces the IndexTracker.BitMapTracker implementation then the above tests passes:
   ```Java
   public IndexProducer absoluteUniqueIndices(final Shape shape) {
       return consumer -> {
           Objects.requireNonNull(consumer, "consumer");
           return uniqueIndices(shape).forEachIndex(IndexFilter.create(
                   Shape.fromKM(shape.getNumberOfBits(), shape.getNumberOfBits()), consumer));
       };
   }
   ```
   
   I think that this can be easily solved by having a new package-private factory constructor in IndexFilter to choose the implementation that can handle exceeding the number of hash functions in the shape:
   
   ```Java
   static IntPredicate create(int numberOfBits, IntPredicate consumer) {
       // Choose the BitMapTracker implementation to handle any number of hash functions
   ```
   
   Or you could create a new Shape as:
   ```Java
   public IndexProducer absoluteUniqueIndices(final Shape shape) {
       return consumer -> {
           Objects.requireNonNull(consumer, "consumer");
           return uniqueIndices(shape).forEachIndex(IndexFilter.create(
                   Shape.fromKM(shape.getNumberOfHashFunctions() * hashers.size(),
                                shape.getNumberOfBits()), consumer));
       };
   }
   ```
   Perhaps the later is simpler.
   
   
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/UniqueIndexProducerFromHasherCollectionTest.java:
##########
@@ -36,4 +36,11 @@ protected int getBehaviour() {
         // index from each hasher. The result is there may still be duplicates.
         return 0;
     }
+
+

Review Comment:
   Remove extra line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromAbsoluteUniqueHasherCollectionTest2.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class IndexProducerFromAbsoluteUniqueHasherCollectionTest2 extends AbstractIndexProducerTest {
+
+    // selecting 11 items from the range [0,9] will cause a collision
+    private Shape shape = Shape.fromKM(11, 10);
+
+    @Override
+    protected IndexProducer createProducer() {
+        return new HasherCollection(new IncrementingHasher(1, 1), new IncrementingHasher(2, 2)).absoluteUniqueIndices(shape);
+    }
+
+    @Override
+    protected IndexProducer createEmptyProducer() {
+        return new HasherCollection().absoluteUniqueIndices(shape);
+    }
+
+    @Override
+    protected int[] getExpectedIndices() {
+        return new int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 0};
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // HasherCollection allows duplicates and may be unordered

Review Comment:
   Update comment as duplicates are not allowed from absoluteUniqueIndices.
   
   Q. Should forEach also be distinct?



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherTest.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromUniqueHasherTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new IncrementingHasher(4, 8).uniqueIndices(Shape.fromKM(17, 72)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(NullHasher.INSTANCE.indices(Shape.fromKM(17, 72)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered
+        return AS_ARRAY_DISTINCT;

Review Comment:
   Update the comment as this does not allow duplicates
   
   Q. Is forEach also distinct?



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+

Review Comment:
   Remove empty line



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1002411748


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -16,35 +16,46 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
+import static org.junit.jupiter.api.Assertions.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
+import static org.junit.jupiter.api.Assumptions.assumeTrue;
 
+import java.util.Arrays;
+import java.util.BitSet;
+
+import org.apache.commons.collections4.bag.TreeBag;
 import org.apache.commons.collections4.bloomfilter.BitCountProducer.BitCountConsumer;
+import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.Test;
 
 public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {
 
+    /** Flag to indicate the {@link BitCountProducer#forEachCount(BitCountConsumer)} is ordered. */
+    protected static final int FOR_EACH_COUNT_ORDERED = 0x10;

Review Comment:
   In the interest of simplicity, can these be dropped?
   
   I previously stated that we should only have them if the behaviour of a BitCountProducer will be different from an IndexProducer. From what I can see in the tests these are used in 3 tests, and only when `FOR_EACH_ORDERED` and `FOR_EACH_DISTINCT` from the AbstractIndexProducerTest are also used. Dropping these constants will reduce maintenance and ensure the flags are correctly set for BitCount and Index producers.
   
   You can add a note in the test to state that the same flags are reused.
   
   I think the simplest way to do this is to update the flags to be private:
   ```Java
   /** Flag to indicate the {@link BitCountProducer#forEachCount(BitCountConsumer)} is ordered.
    * This flag currently reuses the value from IndexProducer. At present no implementations
    * exhibit different behaviour for the two interfaces. This may change in the future and
    * an explicit flag value will be required. */
   private static final int FOR_EACH_COUNT_ORDERED = FOR_EACH_ORDERED;
   /** Flag to indicate the {@link BitCountProducer#forEachCount(BitCountConsumer)} is distinct.
    * This flag currently reuses the value from IndexProducer. At present no implementations
    * exhibit different behaviour for the two interfaces. This may change in the future and
    * an explicit flag value will be required. */
   private static final int FOR_EACH_COUNT_DISTINCT = FOR_EACH_DISTINCT;
   ```
   An alternative would be to have a flag for unspecified behaviour. This must be explicitly set by implementing tests:
   ```
   FOR_EACH_COUNT_UNSPECIFIED = 0x10;
   FOR_EACH_COUNT_ORDERED = 0x20;
   FOR_EACH_COUNT_DISTINCT = 0x40;
   ```
   The abstract test can then check if the behaviour is unspecified. If true then the other flags must be unset. If false then at least one of the other flags must be set. I think this is more work and not required, especially since we are testing the behaviour that is not a part of the contract given in the javadoc of the main implementations. This is more of a behaviour sense check of the implementations.
   



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -47,7 +72,7 @@ default boolean forEachIndex(IntPredicate predicate) {
 
     /**
      * Creates a BitCountProducer from an IndexProducer.  The resulting
-     * producer will count each enabled bit once.
+     * producer will return every index from the IndexProducer with a count of 1.

Review Comment:
   I was considering why do we need to do this: convert an IndexProducer to a BitCountProducer. It would be to add it to a CountingBloomFilter. In this case the conversion does not remove duplicate indices. This means that the BitCountProducer can increment the count of an index more than 1 when adding a single IndexProducer.
   
   This has the following consequences:
   - The count at each index of a CountingBloomFilter does not represent the number of individual hashes added to the filter that had that index enabled. The count is bounded by the number of hashes as a lower limit, i.e. it may be higher.
   - Removal of an IndexProducer from a counting bloom filter must use the same method to construct the BitCountProducer as the addition in order to use the exact same index->count mapping.
   
   I am fine with this behaviour. Filtering an IndexProducer to unique indices is extra work. The `IndexFilter` class helps with this but requires a `Shape`. So it cannot be done in the BitCountProducer interface which has no Shape; other mechanisms to filter the IndexProducer without a bound on the index would be inefficient.
   
   So perhaps we require a bit of extra documentation here to serve as a user-beware warning:
   ```
    * <p>Note that the BitCountProducer does not remove duplicates. Any use of the
    * BitCountProducer to create an aggregate mapping of index to counts, such as a
    * CountingBloomFilter, should use the same BitCountProducer in both add and
    * subtract operations to maintain consistency.
   ```
   I believe this is consistent with the new wording in the javadoc header for the BitCountProducer interface.
   
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +71,89 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
+
+    @Test
+    public void testForEachCountValues() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public final void testBehaviourForEachCount() {

Review Comment:
   Add a javadoc note:
   ```Java
   /**
    * Test the behaviour of {@link BitCountProducer#forEachCount(BitCountConsumer)} with respect
    * to ordered and distinct indices. Currently the behaviour is assumed to be the same as
    * {@link IndexProducer#forEachIndex(java.util.function.IntPredicate)}.
    */
   ```



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();

Review Comment:
   As before, assumeTrue should be in the line after you get the flags.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997962908


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,8 +98,36 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    /**
+     * Creates an array of expected indices.
+     * @return an array of expected indices.
+     */
+    protected abstract int[] getExpectedIndices();
+
+    @Test
+    public final void testAsIndexArrayValues() {
+        List<Integer> lst = new ArrayList<>();
+        Arrays.stream(createProducer().asIndexArray()).boxed().forEach( lst::add );
+        for (int i : getExpectedIndices()) {
+            assertTrue( lst.contains(i), "Missing "+i );
+        }
+    }
+
     @Test
     public final void testForEachIndex() {
+        //IndexProducer producer = createProducer();

Review Comment:
   Remove commented out producer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1001024345


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r972933323


##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -19,7 +19,22 @@
 import java.util.function.IntPredicate;
 
 /**
- * Produces bit counts for counting type Bloom filters.
+ * Defines a mapping of index to counts.
+ *
+ * A BitCountProducer may return duplicate indices and may be unordered.
+ *
+ * The guarantees are:

Review Comment:
   <p>



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -47,7 +60,7 @@ default boolean forEachIndex(IntPredicate predicate) {
 
     /**
      * Creates a BitCountProducer from an IndexProducer.  The resulting
-     * producer will count each enabled bit once.
+     * producer will return every index from the IndexProducer with a count of 1.

Review Comment:
   I would add: Duplicate indices are not expected to be aggregated. Duplicates will be output by the producer and each will be associated with a count of 1.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +48,37 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
-    }
 
     @Test
-    public final void testForEachCount() {
-
+    public final void testForEachCountResults() {
         assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
         assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
-        }
+        assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
+        assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    }
+
+    protected abstract int[][] getExpectedBitCount();

Review Comment:
   This should be commented as the expected count for the method `forEachCount`. It should also be moved above the test methods to put all abstract methods in the same location.
   
   Note that this test for BitCountProducer extends the test for IndexProducer. There is some mismatch here between the tests. The AbstractIndexProducerTest verifies that forEachIndex and asIndexArray are consistent (they output the same indices, but the order and uniqueness does not matter). Each is then tested for its behaviour for ordering and uniqueness; these may be different for the two methods. However I note that there is no test for what the expected indices should actually be, i.e. there is not a `protected abstract int[] getExpectedIndices()` and you have added this method. However note that this method for the expected indices should be used to check the distinct indices and without ordering requirements since the behaviour is separately tested.
   
   The AbstractBitCountProducerTest extends the AbstractIndexProducerTest. So we know that the IndexProducer behaviour is tested. But here we now test the expected order and distinctness of the bit counts in one method. I note that there is only one method for the BitCountProducer and so consistency between different output of the counts in not an issue. However a test should be added for consistency between the bit counts and the indices.
   
   For parity with AbstractIndexProducerTest I created this implementation:
   ```Java
   public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {
       /** Flag to indicate the {@link BitCountProducer#forEachCount(BitCountConsumer)} is ordered. */
       protected static final int FOR_EACH_COUNT_ORDERED = 0x10;
       /** Flag to indicate the {@link BitCountProducer#forEachCount(BitCountConsumer)} is distinct. */
       protected static final int FOR_EACH_COUNT_DISTINCT = 0x20;
   
       /**
        * A testing BitCountConsumer that always returns true.
        */
       private static final BitCountConsumer TRUE_CONSUMER = (i, j) -> true;
       /**
        * A testing BitCountConsumer that always returns false.
        */
       private static final BitCountConsumer FALSE_CONSUMER = (i, j) -> false;
   
       /**
        * Creates a producer with some data.
        * @return a producer with some data
        */
       @Override
       protected abstract BitCountProducer createProducer();
   
       /**
        * Creates an producer without data.
        * @return a producer that has no data.
        */
       @Override
       protected abstract BitCountProducer createEmptyProducer();
   
       // Document me ...
       protected abstract int[][] getExpectedBitCount();
   
       @Test
       public final void testForEachCountPredicates() {
           BitCountProducer populated = createProducer();
           BitCountProducer empty = createEmptyProducer();
   
           assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
           assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
   
           assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
           assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
       }
   
       @Test
       public final void testEmptyBitCountProducer() {
           BitCountProducer empty = createEmptyProducer();
           int ary[] = empty.asIndexArray();
           Assertions.assertEquals(0, ary.length);
           assertTrue(empty.forEachCount((i, j) -> {
               throw new AssertionError("forEachCount consumer should not be called");
           }));
       }
   
       @Test
       public final void testIndexConsistency() {
           BitCountProducer producer = createProducer();
           BitSet bs1 = new BitSet();
           BitSet bs2 = new BitSet();
           producer.forEachIndex(i -> {
               bs1.set(i);
               return true;
           });
           producer.forEachCount((i, j) -> {
               bs2.set(i);
               return true;
           });
           Assertions.assertEquals(bs1, bs2);
       }
   
       @Test
       public void testForEachCount() {
           // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
           final TreeBag<Integer> expected = new TreeBag<>();
           Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
           final TreeBag<Integer> actual = new TreeBag<>();
           createProducer().forEachCount((i, j) -> actual.add(i, j));
           assertEquals(expected, actual);
       }
   
       @Test
       public final void testBehaviourForEachCount() {
           int flags = getBehaviour();
           IntList list = new IntList();
           createProducer().forEachCount((i, j) -> list.add(i));
           int[] actual = list.toArray();
           if ((flags & FOR_EACH_COUNT_ORDERED) != 0) {
               int[] expected = Arrays.stream(actual).sorted().toArray();
               Assertions.assertArrayEquals(expected, actual);
           }
           if ((flags & FOR_EACH_COUNT_DISTINCT) != 0) {
               long count = Arrays.stream(actual).distinct().count();
               Assertions.assertEquals(count, actual.length);
           }
           int[] expected = getExpectedForEach();
           Assertions.assertArrayEquals( expected, actual);
       }
   
       @Test
       public void testForEachCountEarlyExit() {
           int[] passes = new int[1];
           assertTrue(createEmptyProducer().forEachCount((i, j) -> {
               passes[0]++;
               return false;
           }));
           Assertions.assertEquals(0, passes[0]);
   
           assertFalse(createProducer().forEachCount((i, j) -> {
               passes[0]++;
               return false;
           }));
           Assertions.assertEquals(1, passes[0]);
       }
   }
   ```
   This test passes for all implementations. Note that I did not update all the tests to use the new flags marking the FOR_EACH_COUNT behaviour. They work with the current FOR_EACH behaviour flags, i.e. the behaviour of forEachCount is the same as forEachIndex for all implementations. So perhaps this can be dropped and we document the test to assume the forEach methods have the same behaviour.
   



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -19,7 +19,22 @@
 import java.util.function.IntPredicate;
 
 /**
- * Produces bit counts for counting type Bloom filters.
+ * Defines a mapping of index to counts.
+ *
+ * A BitCountProducer may return duplicate indices and may be unordered.
+ *
+ * The guarantees are:
+ * <ul>
+ * <li>that for every unique value produced by the IndexProducer there will be at least one
+ * index in the BitCountProducer.</li>
+ * <li>that the total count of a specific value produced by the IndexProducer will equal the
+ * total of the counts in the BitCountProducer for that index.</li>
+ * </ul>
+ * Example:
+ *
+ * An IndexProducer that generates the values [1,2,3,1,5,6] can be represented by a BitCountProducer

Review Comment:
   <p>



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromDefaultIndexProducerTest.java:
##########
@@ -41,24 +37,19 @@ protected int getBehaviour() {
         return AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
     }
 
-    @Test
-    @Disabled("Current behaviour will return the same index twice, each with a count of 1")
-    public final void testFromIndexProducer() {
-
-        BitCountProducer producer = createProducer();
-        Map<Integer, Integer> m = new HashMap<>();
+    @Override
+    protected int[][] getExpectedBitCount() {
+        return new int[][]{{0, 1}, {63, 1}, {1, 1}, {1, 1}, {64, 1}, {127, 1}, {128, 1}};
+    }
 
-        producer.forEachCount((i, v) -> {
-            m.put(i, v);
-            return true;
-        });
+    @Override
+    protected int[] getExpectedIndex() {
+        return expected;

Review Comment:
   If the Abstract test is modified as suggested to separately test behaviour and the expected indices then this can `return data`.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -29,6 +29,26 @@ public class DefaultIndexProducerTest extends AbstractIndexProducerTest {
 
     private int[] values = generateIntArray(10, 512);
 
+    @Override
+    protected int[] getExpectedIndex() {
+        int last = -1;
+        IntList lst = new IntList();
+        for (int idx : values) {
+            if (idx != last) {

Review Comment:
   This is not eliminating duplicates (which I presume is the intention) since `values` is not sorted. This manifested as a flaky test when the expected indices array did not match the actual.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();

Review Comment:
   I think this should be used in a new test that only checks the same indices are returned. Other tests are used to determine ordering and uniqueness.
   ```Java
       @Test
       public final void testForEachIndex() {
           IndexProducer producer = createProducer();
           BitSet bs1 = new BitSet();
           BitSet bs2 = new BitSet();
           Arrays.stream(getExpectedIndex()).forEach(bs1::set);
           createProducer().forEachIndex(i -> {
               bs2.set(i);
               return true;
           });
           Assertions.assertEquals(bs1, bs2);
       }
   ```
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();
+
+    protected int[] getExpectedForEach() {
+        return getExpectedIndex();
+    }
+
     @Test
     public final void testForEachIndex() {

Review Comment:
   testForEachIndexPredicates



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -19,7 +19,22 @@
 import java.util.function.IntPredicate;
 
 /**
- * Produces bit counts for counting type Bloom filters.
+ * Defines a mapping of index to counts.
+ *
+ * A BitCountProducer may return duplicate indices and may be unordered.
+ *
+ * The guarantees are:
+ * <ul>
+ * <li>that for every unique value produced by the IndexProducer there will be at least one
+ * index in the BitCountProducer.</li>
+ * <li>that the total count of a specific value produced by the IndexProducer will equal the
+ * total of the counts in the BitCountProducer for that index.</li>
+ * </ul>
+ * Example:

Review Comment:
   <p>



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();
+
+    protected int[] getExpectedForEach() {

Review Comment:
   This method is not required. The indices are checked in different tests: the indices are correct; the indices may be sorted; the indices may be distinct.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromIntArrayTest.java:
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromIntArrayTest extends AbstractBitCountProducerTest {
+
+    int[] data = {6, 8, 1, 2, 4, 4, 5};
+    int[] expected = {1, 2, 4, 5, 6, 8};

Review Comment:
   Updating the abstract test removes the requirement for `expected`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -144,12 +148,13 @@ public final void testBehaviourAsIndexArray() {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedIndex();

Review Comment:
   This should not be in the behaviour test.
   
   Note that I did observe failures in the DefaultIndexProducerTest on this method. When duplicates are randomly generated this method will fail.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromSparseBloomFilterTest.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromSparseBloomFilterTest extends AbstractBitCountProducerTest {
+
+    protected Shape shape = Shape.fromKM(17, 72);
+
+    @Override
+    protected BitCountProducer createProducer() {
+        Hasher hasher = new IncrementingHasher(0, 1);
+        BloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(hasher);
+        return BitCountProducer.from(bf);
+

Review Comment:
   Remove empty line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -161,6 +166,8 @@ public final void testBehaviourForEach() {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedForEach();

Review Comment:
   This should not be in the behaviour test



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromHasherTest.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromHasherTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        return BitCountProducer.from(new IncrementingHasher(0, 1).indices(Shape.fromKM(17, 72)));

Review Comment:
   I would update the test to use something other than IncrementingHasher(0, 1), perhaps IncrementingHasher(3, 1), or IncrementingHasher(3, 2). This would check a non-sequential set of indices.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997205831


##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -29,6 +29,26 @@ public class DefaultIndexProducerTest extends AbstractIndexProducerTest {
 
     private int[] values = generateIntArray(10, 512);
 
+    @Override
+    protected int[] getExpectedIndex() {
+        int last = -1;
+        IntList lst = new IntList();
+        for (int idx : values) {
+            if (idx != last) {

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1002454889


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +71,89 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
+
+    @Test
+    public void testForEachCountValues() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public final void testBehaviourForEachCount() {

Review Comment:
   added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008717560


##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherTest.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromUniqueHasherTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new IncrementingHasher(4, 8).uniqueIndices(Shape.fromKM(17, 72)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(NullHasher.INSTANCE.indices(Shape.fromKM(17, 72)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered
+        return AS_ARRAY_DISTINCT;

Review Comment:
   yes, fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromUniqueHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),
+                new IncrementingHasher(2, 2)).uniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(NullHasher.INSTANCE.uniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008717713


##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromAbsoluteUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromAbsoluteUniqueHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),
+                new IncrementingHasher(2, 2)).absoluteUniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new HasherCollection().uniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // Hasher allows duplicates and may be unordered
+        return 0;

Review Comment:
   Yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008718171


##########
src/main/java/org/apache/commons/collections4/bloomfilter/Hasher.java:
##########
@@ -49,7 +49,8 @@ public interface Hasher {
      * Creates an IndexProducer of unique indices for this hasher based on the Shape.
      *
      * <p>This is like the `indices(Shape)` method except that it adds the guarantee that no
-     * duplicate values will be returned</p>
+     * duplicate values will be returned.  The indices produced are equivalent those returned

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1014124634


##########
src/main/java/org/apache/commons/collections4/bloomfilter/IndexProducer.java:
##########
@@ -64,6 +65,11 @@ public boolean forEachIndex(IntPredicate predicate) {
                 }
                 return true;
             }
+
+            @Override
+            public int[] asIndexArray() {
+                return Arrays.copyOf( values, values.length );

Review Comment:
   Better served by `return values.clone()`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +66,94 @@ public boolean test(int index, int count) {
     @Override

Review Comment:
   On line 63 there is the text `creates an producer`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +66,94 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
+    }
+
+    @Test
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
+
+    @Test
+    public void testForEachCountValues() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }

Review Comment:
   Formatting: move this to the next line to be `});`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -106,6 +120,27 @@ public boolean forEachIndex(IntPredicate consumer) {
         };
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices across all the contained

Review Comment:
   `of comprising`: drop the `of`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +66,94 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
+    }
+
+    @Test
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");

Review Comment:
   `Assertions.fail("forEachCount ...");`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -131,35 +165,63 @@ public final void testConsistency() {
         Assertions.assertEquals(bs1, bs2);
     }
 
+    /**
+     * Tests the behaviour of {@code IndexProducer.asIndexArray()}.
+     * The expected behaviour is defined by the {@code getBehaviour()} method.
+     * the index array may be Ordered, Distinct or both.

Review Comment:
   `The index`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromAbsoluteUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromAbsoluteUniqueHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),
+                new IncrementingHasher(7, 2)).absoluteUniqueIndices(Shape.fromKM(5, 10)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new HasherCollection().absoluteUniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        return INDICES_DISTINCT |  INDICES_DISTINCT;

Review Comment:
   Duplicate flags.
   
   Note this should not contain INDICES_ORDERED if it is the only test for the absoluteUniqueIndices(). This is because the HasherCollection does not ensure ordering. If I change the test to this it fails (as expected) if the ordering flag is set:
   ```Java
   return BitCountProducer.from(new HasherCollection(
           new IncrementingHasher(7, 2),
           new IncrementingHasher(1, 1)
       ).absoluteUniqueIndices(Shape.fromKM(5, 10)));
   ```
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromHasherCollectionTest.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps

Review Comment:
   This does not wrap



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromArrayCountingBloomFilterTest.java:
##########
@@ -36,6 +36,18 @@ protected BitCountProducer createEmptyProducer() {
     @Override
     protected int getBehaviour() {
         // CountingBloomFilter based on an array will be distinct and ordered
-        return FOR_EACH_DISTINCT | FOR_EACH_ORDERED | AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
+        return INDICES_DISTINCT | INDICES_ORDERED | INDICES_DISTINCT | INDICES_ORDERED;

Review Comment:
   I think I can see why you have duplicate flags. A simple find and replace for the previous two different flags consolidated to one flag has not worked.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromHasherTest.java:
##########
@@ -16,21 +16,32 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
-public class IndexProducerFromHasherTest extends AbstractIndexProducerTest {
+public class BitCountProducerFromHasherTest extends AbstractBitCountProducerTest {
 
     @Override
-    protected IndexProducer createProducer() {
-        return new IncrementingHasher(0, 1).indices(Shape.fromKM(17, 72));
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps

Review Comment:
   False. No collisions or wrapping



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromIntArrayTest.java:
##########
@@ -16,21 +16,28 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
-public class IndexProducerFromIntArrayTest extends AbstractIndexProducerTest {
+public class BitCountProducerFromIntArrayTest extends AbstractBitCountProducerTest {
+
+    int[] data = {6, 8, 1, 2, 4, 4, 5};
 
     @Override
-    protected IndexProducer createEmptyProducer() {
-        return IndexProducer.fromIndexArray(new int[0]);
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(IndexProducer.fromIndexArray(new int[0]));
     }
 
     @Override
-    protected IndexProducer createProducer() {
-        return IndexProducer.fromIndexArray(new int[] { 6, 8, 1, 2, 4, 4, 5 });
+    protected BitCountProducer createProducer() {
+        return BitCountProducer.from(IndexProducer.fromIndexArray(data));
     }
 
     @Override
     protected int getBehaviour() {
         // Delegates to the default asIndexArray which is distinct and ordered

Review Comment:
   Comment does not match the flags. Delete the comment as the flags are correct.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromSimpleBloomFilterTest.java:
##########
@@ -16,26 +16,31 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
-public class IndexProducerFromSimpleBloomFilterTest extends AbstractIndexProducerTest {
+public class BitCountProducerFromSimpleBloomFilterTest extends AbstractBitCountProducerTest {
 
     protected Shape shape = Shape.fromKM(17, 72);
 
     @Override
-    protected IndexProducer createProducer() {
-        Hasher hasher = new IncrementingHasher(0, 1);
+    protected BitCountProducer createProducer() {
+        Hasher hasher = new IncrementingHasher(3, 2);
         BloomFilter bf = new SimpleBloomFilter(shape);
         bf.merge(hasher);
-        return bf;
+        return BitCountProducer.from(bf);
     }
 
     @Override
-    protected IndexProducer createEmptyProducer() {
-        return new SimpleBloomFilter(shape);
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new SimpleBloomFilter(shape));
     }
 
     @Override
     protected int getBehaviour() {
         // BloomFilter based on a bit map array will be distinct and ordered
-        return FOR_EACH_DISTINCT | FOR_EACH_ORDERED | AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
+        return INDICES_DISTINCT | INDICES_ORDERED | INDICES_DISTINCT | INDICES_ORDERED;

Review Comment:
   More duplicate flags



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromSparseBloomFilterTest.java:
##########
@@ -16,29 +16,33 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
-public class IndexProducerFromSparseBloomFilterTest extends AbstractIndexProducerTest {
+public class BitCountProducerFromSparseBloomFilterTest extends AbstractBitCountProducerTest {
 
     protected Shape shape = Shape.fromKM(17, 72);
 
     @Override
-    protected IndexProducer createProducer() {
-        Hasher hasher = new IncrementingHasher(0, 1);
+    protected BitCountProducer createProducer() {
+        Hasher hasher = new IncrementingHasher(4, 7);
         BloomFilter bf = new SparseBloomFilter(shape);
         bf.merge(hasher);
-        return bf;
-
+        return BitCountProducer.from(bf);
     }
 
     @Override
-    protected IndexProducer createEmptyProducer() {
-        return new SparseBloomFilter(shape);
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new SparseBloomFilter(shape));
     }
 
     @Override
     protected int getBehaviour() {
         // A sparse BloomFilter will be distinct but it may not be ordered.
-        // Currently the ordered behaviour is asserted as the implementation uses
+        // Currently the ordered behavior is asserted as the implementation uses
         // an ordered TreeSet. This may change in the future.
-        return FOR_EACH_DISTINCT | FOR_EACH_ORDERED | AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
+        return INDICES_DISTINCT | INDICES_ORDERED | INDICES_DISTINCT | INDICES_ORDERED;

Review Comment:
   Duplicate flags



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -47,8 +52,7 @@ public boolean forEachIndex(IntPredicate predicate) {
 
     @Override
     protected int getBehaviour() {
-        // The default method streams a BitSet so is distinct and ordered.

Review Comment:
   Note: This test has removed some testing of the default methods in IndexProducer.
   
   It should be named `IndexProducerFromIndexArrayTest`. It is specifically targeting the default implementation of the IndexProducer when created from an index array. In that case the `asIndexArray()` method returns a clone of the original indices used to create it.
   
   A `DefaultIndexProducerTest` would test the default implementation of `asIndexArray()` and look like this.
   
   ```Java
   public class DefaultIndexProducerTest extends AbstractIndexProducerTest {
   
       /** Make forEachIndex unordered and contain duplicates. */
       private int[] values = {10, 1, 10, 1};
   
       @Override
       protected int[] getExpectedIndices() {
           return values;
       }
   
       @Override
       protected IndexProducer createProducer() {
           return new IndexProducer() {
               @Override
               public boolean forEachIndex(IntPredicate predicate) {
                   Objects.requireNonNull(predicate);
                   for (int i : values) {
                       if (!predicate.test(i)) {
                           return false;
                       }
                   }
                   return true;
               }
           };
       }
   
       @Override
       protected IndexProducer createEmptyProducer() {
           return new IndexProducer() {
               @Override
               public boolean forEachIndex(IntPredicate predicate) {
                   Objects.requireNonNull(predicate);
                   return true;
               }
           };
       }
   
       @Override
       protected int getBehaviour() {
           // The default method streams a BitSet so is distinct and ordered.
           // However the forEachIndex may not be distinct and ordered and
           // the test cannot distinguish the two cases.
           return 0;
       }
   }
   ```
   
   Note that this highlights a problem with consolidating all the behaviour flags. Currently the test suite cannot be used to test the asIndexArray behaviour is different from the forEachIndex behaviour.
   
   I think we should revert the consolidation of the flags. However to avoid the duplication of flag setting through the various test cases we can change the AbstractIndexProducerTest test to do this:
   ```Java
   public abstract class AbstractIndexProducerTest {
   
       protected static final int ORDERED = 0x1;
       protected static final int DISTINCT = 0x2;
   
       protected abstract int getAsIndexArrayBehaviour();
   
       protected int getForEachIndexBehaviour() {
           return getAsIndexArrayBehaviour();
       }
   
   }
   ```
   And AbtractBitCountProducerTest:
   ```Java
   public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {
   
       protected int getForEachCountBehaviour() {
           return getAsIndexArrayBehaviour();
       }
   
   }
   ```
   
   Then get the correct flag in the relevant test. It provides the ability to override each test behaviour individually but will default to the same behaviour if not explicitly changed.
   
   Doing this change (DefaultIndexProducerTest and IndexProducerFromIndexArrayTest) should remove the requirement for the additional test `testFromIndexArray`. The method `testFromBitMapProducer` is already covered in IndexProducerFromBitMapProducerTest.
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -131,35 +165,63 @@ public final void testConsistency() {
         Assertions.assertEquals(bs1, bs2);
     }
 
+    /**
+     * Tests the behaviour of {@code IndexProducer.asIndexArray()}.
+     * The expected behaviour is defined by the {@code getBehaviour()} method.
+     * the index array may be Ordered, Distinct or both.
+     * If the index array is not distinct then all elements returned by the {@code getExpectedIndices()}
+     * method, including duplicates, are expected to be returned by the {@code asIndexArray()} method.
+     */
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);
         int[] actual = createProducer().asIndexArray();
-        if ((flags & AS_ARRAY_ORDERED) != 0) {
+        if ((flags & INDICES_ORDERED) != 0) {
             int[] expected = Arrays.stream(actual).sorted().toArray();
             Assertions.assertArrayEquals(expected, actual);
         }
-        if ((flags & AS_ARRAY_DISTINCT) != 0) {
+        if ((flags & INDICES_DISTINCT) != 0) {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
+        } else {
+            // if the array is not distinct all expected elements must be generated
+            List<Integer> lst = new ArrayList<>();

Review Comment:
   Since the order does not matter (and we have already checked the ordered behaviour):
   
   ```Java
   // This is modified so use a copy
   int[] expected = getExpectedIndices().clone();
   Arrays.sort(expected);
   Arrays.sort(actual);
   Assertions.assertArrayEquals(expected, actual);
   ```



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -131,35 +165,63 @@ public final void testConsistency() {
         Assertions.assertEquals(bs1, bs2);
     }
 
+    /**
+     * Tests the behaviour of {@code IndexProducer.asIndexArray()}.
+     * The expected behaviour is defined by the {@code getBehaviour()} method.
+     * the index array may be Ordered, Distinct or both.
+     * If the index array is not distinct then all elements returned by the {@code getExpectedIndices()}
+     * method, including duplicates, are expected to be returned by the {@code asIndexArray()} method.
+     */
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);
         int[] actual = createProducer().asIndexArray();
-        if ((flags & AS_ARRAY_ORDERED) != 0) {
+        if ((flags & INDICES_ORDERED) != 0) {
             int[] expected = Arrays.stream(actual).sorted().toArray();
             Assertions.assertArrayEquals(expected, actual);
         }
-        if ((flags & AS_ARRAY_DISTINCT) != 0) {
+        if ((flags & INDICES_DISTINCT) != 0) {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
+        } else {
+            // if the array is not distinct all expected elements must be generated
+            List<Integer> lst = new ArrayList<>();
+            Arrays.stream(createProducer().asIndexArray()).boxed().forEach( lst::add );
+            for (int i : getExpectedIndices()) {
+                assertTrue( lst.contains(i), "Missing "+i );
+                lst.remove( Integer.valueOf(i));
+            }
+            assertTrue(lst.isEmpty());
         }
     }
 
+    /**
+     * Tests the behaviour of {@code IndexProducer.forEachIndex()}.
+     * The expected behaviour is defined by the {@code getBehaviour()} method.
+     * The order is assumed to follow the order produced by {@code IndexProducer.asIndexArray()}.
+     */
     @Test
-    public final void testBehaviourForEach() {
+    public final void testBehaviourForEachIndex() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (FOR_EACH_ORDERED | FOR_EACH_DISTINCT)) != 0);
         IntList list = new IntList();
         createProducer().forEachIndex(list::add);
         int[] actual = list.toArray();
-        if ((flags & FOR_EACH_ORDERED) != 0) {
+        if ((flags & INDICES_ORDERED) != 0) {
             int[] expected = Arrays.stream(actual).sorted().toArray();
             Assertions.assertArrayEquals(expected, actual);
         }
-        if ((flags & FOR_EACH_DISTINCT) != 0) {
+        if ((flags & INDICES_DISTINCT) != 0) {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
+        } else {
+         // if forEach is not distinct all expected elements must be generated
+            List<Integer> lst = new ArrayList<>();

Review Comment:
   ```Java
   int[] expected = getExpectedIndices().clone();
   Arrays.sort(expected);
   Arrays.sort(actual);
   Assertions.assertArrayEquals(expected, actual);
   ```



##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromBitmapProducerTest.java:
##########
@@ -49,10 +49,15 @@ protected IndexProducer createProducer() {
         return IndexProducer.fromBitMapProducer(producer);
     }
 
+    @Override
+    protected int[] getExpectedIndices() {
+        return new int[]{0, 65, 128, 129};
+    }
+
     @Override
     protected int getBehaviour() {
         // Bit maps will be distinct. Conversion to indices should be ordered.
-        return FOR_EACH_DISTINCT | FOR_EACH_ORDERED | AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
+        return INDICES_DISTINCT | INDICES_ORDERED | INDICES_DISTINCT | INDICES_ORDERED;

Review Comment:
   Duplicates



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997202283


##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromDefaultIndexProducerTest.java:
##########
@@ -41,24 +37,19 @@ protected int getBehaviour() {
         return AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
     }
 
-    @Test
-    @Disabled("Current behaviour will return the same index twice, each with a count of 1")
-    public final void testFromIndexProducer() {
-
-        BitCountProducer producer = createProducer();
-        Map<Integer, Integer> m = new HashMap<>();
+    @Override
+    protected int[][] getExpectedBitCount() {
+        return new int[][]{{0, 1}, {63, 1}, {1, 1}, {1, 1}, {64, 1}, {127, 1}, {128, 1}};
+    }
 
-        producer.forEachCount((i, v) -> {
-            m.put(i, v);
-            return true;
-        });
+    @Override
+    protected int[] getExpectedIndex() {
+        return expected;

Review Comment:
   fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromHasherTest.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromHasherTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        return BitCountProducer.from(new IncrementingHasher(0, 1).indices(Shape.fromKM(17, 72)));

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008716888


##########
src/test/java/org/apache/commons/collections4/bloomfilter/UniqueIndexProducerFromHasherCollectionTest.java:
##########
@@ -36,4 +36,11 @@ protected int getBehaviour() {
         // index from each hasher. The result is there may still be duplicates.
         return 0;
     }
+
+

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008718098


##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of the contained
+     * hashers.
+     *
+     * <p>This method may return duplicates if the collection of unique values from each of the contained
+     * hashers contain duplicates.  This is equivalent to creating Bloom filters for each contained hasher
+     * and returning concatenated IndexProducer from each.</p>

Review Comment:
   fixed



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of the contained

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert merged pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert merged PR #335:
URL: https://github.com/apache/commons-collections/pull/335


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008717823


##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -141,29 +172,16 @@ public boolean forEachIndex(IntPredicate consumer) {
 
         @Override
         public int[] asIndexArray() {
-            List<int[]> lst = new ArrayList<>();
-            int[] count = new int[1];
-            /*
-             * This method needs to return duplicate indices
-             */
-            for (Hasher hasher : hashers) {
-                int[] ary = hasher.indices(shape).asIndexArray();
-                lst.add(ary);
-                count[0] += ary.length;
-            }
-            if (lst.isEmpty()) {
-                return new int[0];
-            }
-            if (lst.size() == 1) {
-                return lst.get(0);
-            }
-            int[] result = new int[count[0]];
-            int offset = 0;
-            for (int[] ary : lst) {
-                System.arraycopy(ary, 0, result, offset, ary.length);
-                offset += ary.length;
-            }
-            return result;
+            int[] result = new int[shape.getNumberOfHashFunctions()*hashers.size()];

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] codecov-commenter commented on pull request #335: Collections 384 bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1242811645

   # [Codecov](https://codecov.io/gh/apache/commons-collections/pull/335?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#335](https://codecov.io/gh/apache/commons-collections/pull/335?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cfa8944) into [master](https://codecov.io/gh/apache/commons-collections/commit/3bc37dcd6b7f4a1b77ee7ce64d5cf916681eed6e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3bc37dc) will **increase** coverage by `0.05%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #335      +/-   ##
   ============================================
   + Coverage     85.93%   85.98%   +0.05%     
   - Complexity     4669     4671       +2     
   ============================================
     Files           289      289              
     Lines         13445    13445              
     Branches       1977     1977              
   ============================================
   + Hits          11554    11561       +7     
   + Misses         1327     1323       -4     
   + Partials        564      561       -3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/commons-collections/pull/335?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ons/collections4/bloomfilter/BitCountProducer.java](https://codecov.io/gh/apache/commons-collections/pull/335/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29sbGVjdGlvbnM0L2Jsb29tZmlsdGVyL0JpdENvdW50UHJvZHVjZXIuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...commons/collections4/map/AbstractReferenceMap.java](https://codecov.io/gh/apache/commons-collections/pull/335/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29sbGVjdGlvbnM0L21hcC9BYnN0cmFjdFJlZmVyZW5jZU1hcC5qYXZh) | `91.48% <0.00%> (+2.59%)` | :arrow_up: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-384][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r998720775


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -43,7 +48,7 @@ public abstract class AbstractIndexProducerTest {
     /**
      * An expandable list of int values.
      */
-    private static class IntList {
+    protected static class IntList {

Review Comment:
   When I checked you had not pushed the entire AbstractBitCountProducerTest and so it was not (yet) used in that class. I see the changes now. Let me know when you have fixed everything and I'll review again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997172774


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();

Review Comment:
   added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-384][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r998703957


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -16,35 +16,29 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
 import org.apache.commons.collections4.bloomfilter.BitCountProducer.BitCountConsumer;
 import org.junit.jupiter.api.Test;
 
 public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {

Review Comment:
   I don't know what happened.  I have, I think, fixed it now.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -134,7 +167,6 @@ public final void testConsistency() {
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-384][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r998703721


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -43,7 +48,7 @@ public abstract class AbstractIndexProducerTest {
     /**
      * An expandable list of int values.
      */
-    private static class IntList {
+    protected static class IntList {

Review Comment:
   Must be protected for AbstractBitCountProducer usage



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1000573446


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -134,8 +176,8 @@ public final void testConsistency() {
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);
         int[] actual = createProducer().asIndexArray();
+        assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);

Review Comment:
   assumeTrue should be the line after you get the flags. Otherwise you create the Producer for no reason.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();

Review Comment:
   This could do with:
   `assumeTrue((flags & (FOR_EACH_COUNT_ORDERED | FOR_EACH_COUNT_DISTINCT)) != 0);`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();
+        createProducer().forEachCount((i, j) -> list.add(i));
+        int[] actual = list.toArray();
+        if ((flags & FOR_EACH_COUNT_ORDERED) != 0) {
+            int[] expected = Arrays.stream(actual).sorted().toArray();
+            Assertions.assertArrayEquals(expected, actual);
+        }
+        if ((flags & FOR_EACH_COUNT_DISTINCT) != 0) {
+            long count = Arrays.stream(actual).distinct().count();
+            Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedForEach();

Review Comment:
   I do not understand why we need this method: `getExpectedForEach()`.
   
   The forEachCount should satisfy the documented contract in the interface:
   
   - that for every unique value produced by the IndexProducer there will be at least one index in the BitCountProducer.
   - that the total count of a specific value produced by the IndexProducer will equal the total of the counts in the BitCountProducer for that index
   
   - Duplicate indices are not required to be aggregated. Duplicates may be output by the producer as noted in the class javadoc (i.e. above).
   
   
   These can be tested without a separate `getExpectedForEach()` array.
   
   Add this method:
   ```Java
       @Test
       public final void testIndexCountConsistency() {
           BitCountProducer producer = createProducer();
           final TreeBag<Integer> expected = new TreeBag<>();
           final TreeBag<Integer> actual = new TreeBag<>();
           producer.forEachIndex(i -> {
               expected.add(i);
               return true;
           });
           producer.forEachCount((i, j) -> {
               actual.add(i, j);
               return true;
           });
           assertEquals(expected, actual);
       }
   ```
   
   Finds the following errors:
   
   - BitCountProducerFromArrayCountingBloomFilterTest
   
   So this is not satisfying the BitCountProducer contract. However I do not think we wish to change the implementation. I think this test is highlighting that we should change the interface definition, specifically this statement which we should not uphold:
   ```
   "that the total count of a specific value produced by the IndexProducer will equal the total of the counts in the BitCountProducer for that index"
   ```
   
   I think we can drop this statement, perhaps to be replaced with: there are no guarantees that the count of indices output by the IndexProducer and total counts of the BitCountProducer are the same; only that the unique indices are the identical.
   
   The rest of the tests just verify whether indices are ordered and distinct. You do not need to provide an expected array of the indices output from `forEachIndex` since the order does not matter. The interface contracts are stating that IndexProducer and BitCountProducer are consistent, but they are not mandated to output in a specific order, or even a matched order between them.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -149,7 +191,6 @@ public final void testBehaviourAsIndexArray() {
     @Test
     public final void testBehaviourForEach() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (FOR_EACH_ORDERED | FOR_EACH_DISTINCT)) != 0);

Review Comment:
   This Assumptions.assumeTrue has not been reinstated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1001083298


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();
+        createProducer().forEachCount((i, j) -> list.add(i));
+        int[] actual = list.toArray();
+        if ((flags & FOR_EACH_COUNT_ORDERED) != 0) {
+            int[] expected = Arrays.stream(actual).sorted().toArray();
+            Assertions.assertArrayEquals(expected, actual);
+        }
+        if ((flags & FOR_EACH_COUNT_DISTINCT) != 0) {
+            long count = Arrays.stream(actual).distinct().count();
+            Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedForEach();

Review Comment:
   I think this is what we have:
   
   ```
    * Defines a mapping of index to counts.
    *
    * <p>Note that a BitCountProducer may return duplicate indices and may be unordered.
    *
    * <p>Implementations must guarantee that:
    *
    * <ul>
    * <li>The mapping of index to counts is the combined sum of counts at each index.
    * <li>For every unique value produced by the IndexProducer there will be at least one matching
    * index and count produced by the BitCountProducer.
    * <li>The BitCountProducer will not generate indices that are not output by the IndexProducer.
    * </ul>
    *
    * <p>Note that implementations that do not output duplicate indices for BitCountProducer and
    * do for IndexProducer, or vice versa, are consistent if the distinct indices from each are
    * the same.
    *
    * <p>For example the mapping [(1,2),(2,3),(3,1)] can be output with many combinations including:
    * <pre>
    * [(1,2),(2,3),(3,1)]
    * [(1,1),(1,1),(2,1),(2,1),(2,1),(3,1)]
    * [(1,1),(3,1),(1,1),(2,1),(2,1),(2,1)]
    * [(3,1),(1,1),(2,2),(1,1),(2,1)]
    * ...
    * </pre>
   ```
   
   I am not sure we need to spell it out with the example. In this case perhaps less is more.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1001025470


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();
+        createProducer().forEachCount((i, j) -> list.add(i));
+        int[] actual = list.toArray();
+        if ((flags & FOR_EACH_COUNT_ORDERED) != 0) {
+            int[] expected = Arrays.stream(actual).sorted().toArray();
+            Assertions.assertArrayEquals(expected, actual);
+        }
+        if ((flags & FOR_EACH_COUNT_DISTINCT) != 0) {
+            long count = Arrays.stream(actual).distinct().count();
+            Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedForEach();

Review Comment:
   At first I thought you were wrong, but now I see we are in violent agreement and my documentation is poorly written! 
   
   Given the example at the bottom of the BitCountProducer interface documentation.
   
   ```
    * An IndexProducer that generates the values [1,2,3,1,5,6] can be represented by a BitCountProducer
    * that generates [(1,2),(2,1),(3,1),(5,1)(6,1)] or [(1,1),(2,1),(3,1),(1,1),(5,1),(6,1)] where the
    * entries may be in any order.
   ```
   
   How would you rephrase:
   
   ```
    * A BitCountProducer may return duplicate indices and may be unordered.
    *
    * The guarantees are:
    * <ul>
    * <li>that for every unique value produced by the IndexProducer there will be at least one
    * index in the BitCountProducer.</li>
    * <li>that the total count of a specific value produced by the IndexProducer will equal the
    * total of the counts in the BitCountProducer for that index.</li>
    * </ul>
   ```
   
   I have removed the `getExpectedForEach()` from the tests.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-384][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r998702044


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -149,7 +181,6 @@ public final void testBehaviourAsIndexArray() {
     @Test
     public final void testBehaviourForEach() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (FOR_EACH_ORDERED | FOR_EACH_DISTINCT)) != 0);

Review Comment:
   fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,8 +98,36 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    /**
+     * Creates an array of expected indices.
+     * @return an array of expected indices.
+     */
+    protected abstract int[] getExpectedIndices();
+
+    @Test
+    public final void testAsIndexArrayValues() {
+        List<Integer> lst = new ArrayList<>();

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997175203


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -144,12 +148,13 @@ public final void testBehaviourAsIndexArray() {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedIndex();

Review Comment:
   removed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -161,6 +166,8 @@ public final void testBehaviourForEach() {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedForEach();

Review Comment:
   Moved to its own testForEachIndex



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997171493


##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -47,7 +60,7 @@ default boolean forEachIndex(IntPredicate predicate) {
 
     /**
      * Creates a BitCountProducer from an IndexProducer.  The resulting
-     * producer will count each enabled bit once.
+     * producer will return every index from the IndexProducer with a count of 1.

Review Comment:
   added with reference to class javaodc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1003575545


##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -57,12 +88,22 @@ static BitCountProducer from(IndexProducer idx) {
             public boolean forEachCount(BitCountConsumer consumer) {
                 return idx.forEachIndex(i -> consumer.test(i, 1));
             }
+
+            @Override
+            public int[] asIndexArray() {

Review Comment:
   I presume this change is required otherwise the test suite fails.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1014603229


##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -36,22 +38,41 @@ protected int[] getExpectedIndices() {
 
     @Override
     protected IndexProducer createProducer() {
-        return IndexProducer.fromIndexArray(values);
+        return new IndexProducer() {
+            @Override
+            public boolean forEachIndex(IntPredicate predicate) {
+                Objects.requireNonNull(predicate);
+                for (int i : values) {
+                    if (!predicate.test(i)) {
+                        return false;
+                    }
+                }
+                return true;
+            }
+        };
     }
 
     @Override
     protected IndexProducer createEmptyProducer() {
         return new IndexProducer() {
-
             @Override
             public boolean forEachIndex(IntPredicate predicate) {
+                Objects.requireNonNull(predicate);
                 return true;
             }
         };
     }
 
     @Override
-    protected int getBehaviour() {
+    protected int getAsIndexArrayBehaviour() {
+        // The default method streams a BitSet so is distinct and ordered.
+        // However the forEachIndex may not be distinct and ordered and

Review Comment:
   Again can you move this comment to `getForEachIndexBehaviour` as:
   ```Java
   // The forEachIndex output is intentionally unordered and contains duplicates
   ```



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBitCountProducerTest.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+import java.util.Arrays;
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+public class DefaultBitCountProducerTest extends AbstractBitCountProducerTest {
+
+    /** Make forEachIndex unordered and contain duplicates. */
+    private int[] values = {10, 1, 10, 1};
+
+    @Override
+    protected int[] getExpectedIndices() {
+        return values;
+    }
+
+    @Override
+    protected BitCountProducer createProducer() {
+        return new BitCountProducer() {
+            @Override
+            public boolean forEachIndex(IntPredicate predicate) {
+                Objects.requireNonNull(predicate);
+                for (int i : values) {
+                    if (!predicate.test(i)) {
+                        return false;
+                    }
+                }
+                return true;
+            }
+
+            @Override
+            public boolean forEachCount(BitCountConsumer consumer) {
+                int[] vals = values.clone();
+                Arrays.sort(vals);
+                for (int i : vals) {
+                    if (!consumer.test(i, 1)) {
+                        return false;
+                    }
+                }
+                return true;
+            }
+        };
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return new BitCountProducer() {
+            @Override
+            public boolean forEachIndex(IntPredicate predicate) {
+                Objects.requireNonNull(predicate);
+                return true;
+            }
+            @Override
+            public boolean forEachCount(BitCountConsumer consumer) {
+                return true;
+            }
+        };
+    }
+
+    @Override
+    protected int getAsIndexArrayBehaviour() {
+        // The default method streams a BitSet so is distinct and ordered.
+        // However the forEachIndex may not be distinct and ordered and

Review Comment:
   This comment is outdated. The comment should be moved to `getForEachIndexBehaviour` as:
   ```Java
   // The forEachIndex output is intentionally unordered and contains duplicates
   ```
   
   If you wish to test forEachCount can use different flags then add some comments:
   ```Java
   // Force to be ordered
   int[] vals = values.clone();
   Arrays.sort(vals);
   
   // ...
   
   // The forEachCount output is intentionally ordered and contains duplicates
   ```



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBitCountProducerTest.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+import java.util.Arrays;
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+public class DefaultBitCountProducerTest extends AbstractBitCountProducerTest {
+
+    /** Make forEachIndex unordered and contain duplicates. */
+    private int[] values = {10, 1, 10, 1};
+
+    @Override
+    protected int[] getExpectedIndices() {
+        return values;
+    }
+
+    @Override
+    protected BitCountProducer createProducer() {
+        return new BitCountProducer() {
+            @Override
+            public boolean forEachIndex(IntPredicate predicate) {
+                Objects.requireNonNull(predicate);
+                for (int i : values) {
+                    if (!predicate.test(i)) {
+                        return false;
+                    }
+                }
+                return true;
+            }
+
+            @Override
+            public boolean forEachCount(BitCountConsumer consumer) {
+                int[] vals = values.clone();

Review Comment:
   There are no comments as to why you are doing this. Was it for testing?
   
   I suggest leaving forEachCount to not be ordered and match the forEachIndex order. This should test the default implementations of BitCountProducer, not add in extra tests for convenience.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1301446517

   @aherbert The changes are in.  Have a look please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1002455005


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -16,35 +16,46 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
+import static org.junit.jupiter.api.Assertions.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
+import static org.junit.jupiter.api.Assumptions.assumeTrue;
 
+import java.util.Arrays;
+import java.util.BitSet;
+
+import org.apache.commons.collections4.bag.TreeBag;
 import org.apache.commons.collections4.bloomfilter.BitCountProducer.BitCountConsumer;
+import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.Test;
 
 public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {
 
+    /** Flag to indicate the {@link BitCountProducer#forEachCount(BitCountConsumer)} is ordered. */
+    protected static final int FOR_EACH_COUNT_ORDERED = 0x10;

Review Comment:
   I'm not sure I like this, but I can't think of a good counter argument.  I could do the hand waving and say IndexProducer and BitCountProducer produce different orders but I can't see a reasonable example in any possible implementation I can come up with.  So, changes made.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1002454044


##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -47,7 +72,7 @@ default boolean forEachIndex(IntPredicate predicate) {
 
     /**
      * Creates a BitCountProducer from an IndexProducer.  The resulting
-     * producer will count each enabled bit once.
+     * producer will return every index from the IndexProducer with a count of 1.

Review Comment:
   added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1288906258

   > Last commit was on Oct 21st. Is this correct or do you need to push again?
   
   Found the issue on my system.  Should be updated now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1001024115


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -134,8 +176,8 @@ public final void testConsistency() {
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);
         int[] actual = createProducer().asIndexArray();
+        assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);

Review Comment:
   fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -149,7 +191,6 @@ public final void testBehaviourAsIndexArray() {
     @Test
     public final void testBehaviourForEach() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (FOR_EACH_ORDERED | FOR_EACH_DISTINCT)) != 0);

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1014415306


##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -47,8 +52,7 @@ public boolean forEachIndex(IntPredicate predicate) {
 
     @Override
     protected int getBehaviour() {
-        // The default method streams a BitSet so is distinct and ordered.

Review Comment:
   Corrected.  I also added `DefaultBitCountProducerTest` and removed duplicate test `UniqueIndexProducerFromHashCollectionTest` which was effectively a duplicate of `BitCountProducerFromUniqueHasherCollectionTest`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997963696


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -134,7 +167,6 @@ public final void testConsistency() {
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);

Review Comment:
   These Assumptions will skip the test if there is no known behaviour. Otherwise the test asserts nothing so there is no point running it.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -16,35 +16,29 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
 import org.apache.commons.collections4.bloomfilter.BitCountProducer.BitCountConsumer;
 import org.junit.jupiter.api.Test;
 
 public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {

Review Comment:
   I am not seeing the version of AbstractBitCountProducerTest that I provided. However you have noted in the comment that you 'moved and added code'. So I am not sure what you updated. I think all you need to do is paste in the test methods from the version I provided. In your current version the test asserts that the order from getExpectedBitCount is matched, including any duplicate indices. In my test class we separate the aggregation of the bit counts (testForEachCount) and the test for uniqueness and ordering (testBehaviourForEachCount) to respect the behaviour flag. There is also an early exit test. This thus makes the BitCount producer test more closely match the IndexProducer test.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -19,13 +19,18 @@
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
+import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.BitSet;
+import java.util.List;
 import java.util.function.IntPredicate;
 import org.junit.jupiter.api.Assertions;
-import org.junit.jupiter.api.Assumptions;
 import org.junit.jupiter.api.Test;
 
+/**
+ * Test for IndexProducer.
+ *

Review Comment:
   Remove extra blank line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,8 +98,36 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    /**
+     * Creates an array of expected indices.
+     * @return an array of expected indices.
+     */
+    protected abstract int[] getExpectedIndices();

Review Comment:
   Note that this method requires assumptions. The expected indices are dependent on the producer returned from createProducer, which is ultimately dependent on the shape of the Bloom filter.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,8 +98,36 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    /**
+     * Creates an array of expected indices.
+     * @return an array of expected indices.
+     */
+    protected abstract int[] getExpectedIndices();
+
+    @Test
+    public final void testAsIndexArrayValues() {
+        List<Integer> lst = new ArrayList<>();

Review Comment:
   This test is asserting the distinctness of indices using lst.contains (which is inefficient). If you only wish to test distinctness then use a BitSet not a List<Integer>. Use of a list (over a set) would be to test the ordering.
   
   Given we have a test for distinctness and ordering then I think this test should be:
   ```Java
   @Test
   public final void testAsIndexArrayValues() {
       BitSet bs = new BitSet();
       Arrays.stream(createProducer().asIndexArray()).forEach(bs::set);
       for (int i : getExpectedIndices()) {
           assertTrue(bs.get(i), () -> "Missing " + i);
       }
   }
   ```
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -43,7 +48,7 @@ public abstract class AbstractIndexProducerTest {
     /**
      * An expandable list of int values.
      */
-    private static class IntList {
+    protected static class IntList {

Review Comment:
   This can be private. It is not used in any other class.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -16,35 +16,29 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
 import org.apache.commons.collections4.bloomfilter.BitCountProducer.BitCountConsumer;
 import org.junit.jupiter.api.Test;
 
 public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {
-
+    /**
+     * A testing BitCountConsumer that always returns true.
+     */
+    private static final BitCountConsumer TRUE_CONSUMER = (i, j) -> true;
     /**
      * A testing BitCountConsumer that always returns false.
      */
-    public static BitCountConsumer FALSE_CONSUMER = new BitCountConsumer() {
-
-        @Override
-        public boolean test(int index, int count) {
-            return false;
-        }
-    };
+    private static final BitCountConsumer FALSE_CONSUMER = (i, j) -> false;
 
     /**
-     * A testing BitCountConsumer that always returns true.
+     * Creates an array of integer pairs comprising the index and the expected count for the index.
+     * @return an array of integer pairs comprising the index and the expected count for the index.
      */
-    public static BitCountConsumer TRUE_CONSUMER = new BitCountConsumer() {
-
-        @Override
-        public boolean test(int index, int count) {
-            return true;
-        }
-    };
+    protected abstract int[][] getExpectedBitCount();

Review Comment:
   This method may not need to be abstract. In most cases where ordering and duplicates are not mandated then this implementation will be fine:
   ```Java
       protected int[][] getExpectedBitCount() {
           return Arrays.stream(getExpectedIndices()).mapToObj(x -> new int[] {x, 1}).toArray(int[][]::new);
       }
   ```
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -149,7 +181,6 @@ public final void testBehaviourAsIndexArray() {
     @Test
     public final void testBehaviourForEach() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (FOR_EACH_ORDERED | FOR_EACH_DISTINCT)) != 0);

Review Comment:
   Reinstate the Assumptions



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromIntArrayTest.java:
##########
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromIntArrayTest extends AbstractBitCountProducerTest {
+
+    int[] data = {6, 8, 1, 2, 4, 4, 5};
+

Review Comment:
   Remove extra line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromIntArrayTest.java:
##########
@@ -18,19 +18,32 @@
 
 public class IndexProducerFromIntArrayTest extends AbstractIndexProducerTest {
 
+    int[] data = { 6, 8, 1, 2, 4, 4, 5 };
+    int[] expected = {1, 2, 4, 5, 6, 8};
+
     @Override
     protected IndexProducer createEmptyProducer() {
         return IndexProducer.fromIndexArray(new int[0]);
     }
 
     @Override
     protected IndexProducer createProducer() {
-        return IndexProducer.fromIndexArray(new int[] { 6, 8, 1, 2, 4, 4, 5 });
+        return IndexProducer.fromIndexArray(data);
     }
 
     @Override
     protected int getBehaviour() {
         // Delegates to the default asIndexArray which is distinct and ordered
         return AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
     }
+
+    @Override
+    protected int[] getExpectedIndices() {
+        return expected;
+    }
+
+//    @Override

Review Comment:
   Remove commented out code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997174291


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();
+
+    protected int[] getExpectedForEach() {
+        return getExpectedIndex();
+    }
+
     @Test
     public final void testForEachIndex() {

Review Comment:
   renamed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -144,12 +148,13 @@ public final void testBehaviourAsIndexArray() {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedIndex();

Review Comment:
   moved to testAsIndexArray



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997204328


##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromIntArrayTest.java:
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromIntArrayTest extends AbstractBitCountProducerTest {
+
+    int[] data = {6, 8, 1, 2, 4, 4, 5};
+    int[] expected = {1, 2, 4, 5, 6, 8};

Review Comment:
   Removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r997174031


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();
+
+    protected int[] getExpectedForEach() {

Review Comment:
   removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r998691800


##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromIntArrayTest.java:
##########
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromIntArrayTest extends AbstractBitCountProducerTest {
+
+    int[] data = {6, 8, 1, 2, 4, 4, 5};
+

Review Comment:
   fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromIntArrayTest.java:
##########
@@ -18,19 +18,32 @@
 
 public class IndexProducerFromIntArrayTest extends AbstractIndexProducerTest {
 
+    int[] data = { 6, 8, 1, 2, 4, 4, 5 };
+    int[] expected = {1, 2, 4, 5, 6, 8};
+
     @Override
     protected IndexProducer createEmptyProducer() {
         return IndexProducer.fromIndexArray(new int[0]);
     }
 
     @Override
     protected IndexProducer createProducer() {
-        return IndexProducer.fromIndexArray(new int[] { 6, 8, 1, 2, 4, 4, 5 });
+        return IndexProducer.fromIndexArray(data);
     }
 
     @Override
     protected int getBehaviour() {
         // Delegates to the default asIndexArray which is distinct and ordered
         return AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
     }
+
+    @Override
+    protected int[] getExpectedIndices() {
+        return expected;
+    }
+
+//    @Override

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1288862048

   Let me see what happened.  In pushed this AM.  (or so I thought)
   
   
   On Mon, Oct 24, 2022, 11:57 Alex Herbert ***@***.***> wrote:
   
   > Last commit was on Oct 21st. Is this correct or do you need to push again?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/commons-collections/pull/335#issuecomment-1288857212>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AASTVHQUXZFZCYJQ2RTT523WEZTPXANCNFSM6AAAAAAQJQDJ7M>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on pull request #335: Collections 384 bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on PR #335:
URL: https://github.com/apache/commons-collections/pull/335#issuecomment-1242904184

   @aherbert Described the BitCountProducer contract in the BitCountProducer javadoc, and added tests to show that they work as expected.  I hope I have not made too much of a hash of it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008716922


##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromAbsoluteUniqueHasherCollectionTest2.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class IndexProducerFromAbsoluteUniqueHasherCollectionTest2 extends AbstractIndexProducerTest {
+
+    // selecting 11 items from the range [0,9] will cause a collision
+    private Shape shape = Shape.fromKM(11, 10);
+
+    @Override
+    protected IndexProducer createProducer() {
+        return new HasherCollection(new IncrementingHasher(1, 1), new IncrementingHasher(2, 2)).absoluteUniqueIndices(shape);
+    }
+
+    @Override
+    protected IndexProducer createEmptyProducer() {
+        return new HasherCollection().absoluteUniqueIndices(shape);
+    }
+
+    @Override
+    protected int[] getExpectedIndices() {
+        return new int[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 0};
+    }
+
+    @Override
+    protected int getBehaviour() {
+        // HasherCollection allows duplicates and may be unordered

Review Comment:
   Updataed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008717627


##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008718041


##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -106,6 +120,23 @@ public boolean forEachIndex(IntPredicate consumer) {
         };
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices across all the contained
+     * hashers.
+     *
+     * <p>This is equivalent to an IndexProducer created from a Bloom filter that comprises all
+     * the contained hashers.</p>
+     *
+     * @param shape the shape of the desired Bloom filter.
+     * @return the iterator of integers
+     */
+    public IndexProducer absoluteUniqueIndices(final Shape shape) {
+        return consumer -> {
+            Objects.requireNonNull(consumer, "consumer");
+            return uniqueIndices(shape).forEachIndex(IndexFilter.create(shape, consumer));

Review Comment:
   implemented later version with a check for hashers.size() == 0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1008718061


##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -90,6 +90,20 @@ public IndexProducer indices(final Shape shape) {
         return new HasherCollectionIndexProducer(shape);
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices from each of the contained
+     * hashers.
+     *
+     * <p>This method may return duplicates if the collection of unique values from each of the contained
+     * hashers contain duplicates.  This is equivalent to creating Bloom filters for each contained hasher
+     * and returning concatenated IndexProducer from each.</p>
+     *
+     * <p>A BitCountProducer generated from this IndexProducer is equivalent to a BitCountProducer from a
+     * counting Bloom filter that was constructed from the contained hashers.<p>

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1014652758


##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBitCountProducerTest.java:
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+import java.util.Arrays;
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+public class DefaultBitCountProducerTest extends AbstractBitCountProducerTest {
+
+    /** Make forEachIndex unordered and contain duplicates. */
+    private int[] values = {10, 1, 10, 1};
+
+    @Override
+    protected int[] getExpectedIndices() {
+        return values;
+    }
+
+    @Override
+    protected BitCountProducer createProducer() {
+        return new BitCountProducer() {
+            @Override
+            public boolean forEachIndex(IntPredicate predicate) {
+                Objects.requireNonNull(predicate);
+                for (int i : values) {
+                    if (!predicate.test(i)) {
+                        return false;
+                    }
+                }
+                return true;
+            }
+
+            @Override
+            public boolean forEachCount(BitCountConsumer consumer) {
+                int[] vals = values.clone();

Review Comment:
   I should have reverted that.  It was to test that the forEachCount tests used the proper behaviour.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org