You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Johan Stuyts <j....@hippo.nl> on 2006/07/26 11:50:58 UTC
Method to speed up caching for faceted navigation
Hi,
I am working on faceted navigation. This is nothing new but I am
anticpating an index that changes very frequently (every couple of
seconds). After the index has been updated, I need to cache the bit sets
of the facet values so I can do counting during searches later on.
Because I need to get a lot of bit sets often this needs to be as fast
as possible.
I did the following:
IndexReader ir = ...;
TermDocs td = ir.termDocs(new Term("facet name", "facet value"));
while (td.next())
{
bitSet.set(td.doc());
}
The problem with this code is that it gets the document IDs one by one.
I tried to optimize the loop by reading blocks of IDs by using
'read(int[], int[])', but this did not have a noticable effect.
I looked at the implementation of 'read(int[], int[])' in
'SegmentTermDocs' and saw that it did the following things:
- check if the document has a frequency higher than 1, and if so read
it;
- check if the document has been deleted, and if so don't add it to the
result;
- store the document IDs, counts and frequences in attributes instead of
local variables.
Given that the following preconditions hold in my situation:
- all documents have a frequency of 1 for the term;
- I never delete documents using the 'IndexReader' from which I get the
'TermDocs' object;
- I am only interested in the document IDs.
I made 'SegmentTermDocs' a public class and added the following method.
This method eliminates the overhead in the 'read(int[], int[]) method:
public void readDocsWithoutFreqsAssumingNoDeletions(final BitSet
destination)
throws IOException {
int count = this.count;
final int df = this.df;
int doc = this.doc;
while (count < df) {
doc += freqStream.readVInt() >>> 1;
count++;
destination.set(doc);
}
// Leave a consistent state
this.doc = doc;
freq = 1;
this.count = df;
}
By using the method above I gained a speed improvement of over 20%.
Will this method always work correctly given the preconditions?
Kind regards,
Johan Stuyts
Hippo
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Method to speed up caching for faceted navigation
Posted by Chris Hostetter <ho...@fucit.org>.
: I looked at the implementation of 'read(int[], int[])' in
: 'SegmentTermDocs' and saw that it did the following things:
: - check if the document has a frequency higher than 1, and if so read
: it;
: - check if the document has been deleted, and if so don't add it to the
: result;
: - store the document IDs, counts and frequences in attributes instead of
: local variables.
:
: Given that the following preconditions hold in my situation:
: - all documents have a frequency of 1 for the term;
: - I never delete documents using the 'IndexReader' from which I get the
: 'TermDocs' object;
: - I am only interested in the document IDs.
I don't think it really matters wether you do deletes on the same
IndexReader -- what matters is if there has been any deletes done to the
index prior to opening the reader since it was last optimized. The reason
being that deleting a document just causes a record of the deletion to be
made, but no Term/DOc mapping information is removed.
So if your index is read only and will never contain any deleted
documents, then your method may be safe (i'm not 100% certain of that,
just guessing) but if it's possible for documents to be deleted from your
index at some point, then i'm 99% sure your approach will result it
indicating matches on documents which are no longer "viable"
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org