You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Claude Warren (Jira)" <ji...@apache.org> on 2022/06/15 08:32:00 UTC
[jira] [Updated] (COLLECTIONS-823) BloomFilter: Optimize ArrayCountingBloomFilter.ForEachBitMap
[ https://issues.apache.org/jira/browse/COLLECTIONS-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Claude Warren updated COLLECTIONS-823:
--------------------------------------
Description:
Member
h3. !https://avatars.githubusercontent.com/u/886334?s=48&v=4|width=24,height=24! *[aherbert|https://github.com/aherbert]* [on 27 Feb|https://github.com/apache/commons-collections/pull/258#discussion_r812499923]
This converts all the non zero indices to a bitmap long[] array. But to do so requires using the {{forEachIndex}} method with a conditional boolean check on each loop iteration. I wonder if this should be brought inline for efficiency:
{noformat}
@Override
public boolean forEachBitMap(LongPredicate consumer) {
Objects.requireNonNull(consumer, "consumer");
long[] result = new long[BitMap.numberOfBitMaps(shape.getNumberOfBits())];
for (int i = 0; i < counts.length; i++) {
if (counts[i] != 0) {
// Avoids a second check on the predicate result in forEachIndex
BitMap.set(result, i);
}
}
return BitMapProducer.fromBitMapArray(result).forEachBitMap(consumer);
}
{noformat}
Or better yet, avoid the {{long[]}} array:
{noformat}
int blocksm1 = BitMap.numberOfBitMaps(shape.getNumberOfBits()) - 1;
int i = 0;
long value;
for (int j = 0; j < blocksm1; j++) {
value = 0;
for (int k = 0; k < Long.SIZE; k++) {
if (counts[i++] != 0) {
value |= BitMap.getLongBit(k);
}
}
if (!consumer.test(value)) {
return false;
}
}
// Final block
value = 0;
for (int k = 0; i < counts.length; k++) {
if (counts[i++] != 0) {
value |= BitMap.getLongBit(k);
}
}
return consumer.test(value);
{noformat}
was:
Member
h3. !https://avatars.githubusercontent.com/u/886334?s=48&v=4|width=24,height=24! *[aherbert|https://github.com/aherbert]* [on 27 Feb|https://github.com/apache/commons-collections/pull/258#discussion_r812499923]
This converts all the non zero indices to a bitmap long[] array. But to do so requires using the {{forEachIndex}} method with a conditional boolean check on each loop iteration. I wonder if this should be brought inline for efficiency:
@Override
public boolean forEachBitMap(LongPredicate consumer) {
Objects.requireNonNull(consumer, "consumer");
long[] result = new long[BitMap.numberOfBitMaps(shape.getNumberOfBits())];
for (int i = 0; i < counts.length; i++) {
if (counts[i] != 0) {
// Avoids a second check on the predicate result in forEachIndex
BitMap.set(result, i);
}
}
return BitMapProducer.fromBitMapArray(result).forEachBitMap(consumer);
}
Or better yet, avoid the {{long[]}} array:
int blocksm1 = BitMap.numberOfBitMaps(shape.getNumberOfBits()) - 1;
int i = 0;
long value;
for (int j = 0; j < blocksm1; j++) {
value = 0;
for (int k = 0; k < Long.SIZE; k++) {
if (counts[i++] != 0) {
value |= BitMap.getLongBit(k);
}
}
if (!consumer.test(value)) {
return false;
}
}
// Final block
value = 0;
for (int k = 0; i < counts.length; k++) {
if (counts[i++] != 0) {
value |= BitMap.getLongBit(k);
}
}
return consumer.test(value);
> BloomFilter: Optimize ArrayCountingBloomFilter.ForEachBitMap
> ------------------------------------------------------------
>
> Key: COLLECTIONS-823
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-823
> Project: Commons Collections
> Issue Type: Improvement
> Components: Collection
> Affects Versions: 4.5
> Reporter: Claude Warren
> Priority: Minor
> Labels: bloom-filter
>
>
>
> Member
> h3. !https://avatars.githubusercontent.com/u/886334?s=48&v=4|width=24,height=24! *[aherbert|https://github.com/aherbert]* [on 27 Feb|https://github.com/apache/commons-collections/pull/258#discussion_r812499923]
> This converts all the non zero indices to a bitmap long[] array. But to do so requires using the {{forEachIndex}} method with a conditional boolean check on each loop iteration. I wonder if this should be brought inline for efficiency:
> {noformat}
> @Override
> public boolean forEachBitMap(LongPredicate consumer) {
> Objects.requireNonNull(consumer, "consumer");
> long[] result = new long[BitMap.numberOfBitMaps(shape.getNumberOfBits())];
> for (int i = 0; i < counts.length; i++) {
> if (counts[i] != 0) {
> // Avoids a second check on the predicate result in forEachIndex
> BitMap.set(result, i);
> }
> }
> return BitMapProducer.fromBitMapArray(result).forEachBitMap(consumer);
> }
> {noformat}
> Or better yet, avoid the {{long[]}} array:
> {noformat}
> int blocksm1 = BitMap.numberOfBitMaps(shape.getNumberOfBits()) - 1;
> int i = 0;
> long value;
> for (int j = 0; j < blocksm1; j++) {
> value = 0;
> for (int k = 0; k < Long.SIZE; k++) {
> if (counts[i++] != 0) {
> value |= BitMap.getLongBit(k);
> }
> }
> if (!consumer.test(value)) {
> return false;
> }
> }
> // Final block
> value = 0;
> for (int k = 0; i < counts.length; k++) {
> if (counts[i++] != 0) {
> value |= BitMap.getLongBit(k);
> }
> }
> return consumer.test(value);
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)