You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Rob Audenaerde <ro...@gmail.com> on 2020/02/21 10:30:07 UTC
use advance() in RandomSamplingFacetCollector
In the code that estimates facet counts by taking random samples; this is
the inner loop:
final DocIdSetIterator it = docs.bits.iterator();
for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc
= it.nextDoc()) {
if (counter == randomIndex) {
sampleDocs.set(doc);
}
counter++;
if (counter >= limit) {
counter = 0;
limit = binSize;
randomIndex = random.nextInt(binSize);
}
}
So it iterates over each document, skipping them along the way. But the
DocIdSetIterator also provides an 'advance' method, I thought maybe we can
use that to iterate faster?
Something like this:
final DocIdSetIterator it = docs.bits.iterator();
int doc = it.nextDoc();
if ((doc + randomIndex) < docs.totalHits) {
for (doc = it.advance(doc + randomIndex); doc !=
DocIdSetIterator.NO_MORE_DOCS; doc = it.advance(doc + randomIndex)) {
sampleDocs.add(doc);
randomIndex = this.random.nextInt(binSize) + 1;
//Can't stay at same document, that does not make sense.
}
}
What do you think?