You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Johan Stuyts <j....@hippo.nl> on 2006/07/26 11:50:58 UTC

Method to speed up caching for faceted navigation

Hi,

I am working on faceted navigation. This is nothing new but I am
anticpating an index that changes very frequently (every couple of
seconds). After the index has been updated, I need to cache the bit sets
of the facet values so I can do counting during searches later on.
Because I need to get a lot of bit sets often this needs to be as fast
as possible.

I did the following:
  IndexReader ir = ...;
  TermDocs td = ir.termDocs(new Term("facet name", "facet value"));
  while (td.next())
  {
    bitSet.set(td.doc());
  }

The problem with this code is that it gets the document IDs one by one.
I tried to optimize the loop by reading blocks of IDs by using
'read(int[], int[])', but this did not have a noticable effect.

I looked at the implementation of 'read(int[], int[])' in
'SegmentTermDocs' and saw that it did the following things:
- check if the document has a frequency higher than 1, and if so read
it;
- check if the document has been deleted, and if so don't add it to the
result;
- store the document IDs, counts and frequences in attributes instead of
local variables.

Given that the following preconditions hold in my situation:
- all documents have a frequency of 1 for the term;
- I never delete documents using the 'IndexReader' from which I get the
'TermDocs' object;
- I am only interested in the document IDs.

I made 'SegmentTermDocs' a public class and added the following method.
This method eliminates the overhead in the 'read(int[], int[]) method:
  public void readDocsWithoutFreqsAssumingNoDeletions(final BitSet
destination)
          throws IOException {
    int count = this.count;
    final int df = this.df;
    int doc = this.doc;
    while (count < df) {
      doc += freqStream.readVInt() >>> 1;
      count++;

      destination.set(doc);
    }
    // Leave a consistent state
    this.doc = doc;
    freq = 1;
    this.count = df;
  }

By using the method above I gained a speed improvement of over 20%.

Will this method always work correctly given the preconditions?

Kind regards,

Johan Stuyts
Hippo

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Method to speed up caching for faceted navigation

Posted by Chris Hostetter <ho...@fucit.org>.

: I looked at the implementation of 'read(int[], int[])' in
: 'SegmentTermDocs' and saw that it did the following things:
: - check if the document has a frequency higher than 1, and if so read
: it;
: - check if the document has been deleted, and if so don't add it to the
: result;
: - store the document IDs, counts and frequences in attributes instead of
: local variables.
:
: Given that the following preconditions hold in my situation:
: - all documents have a frequency of 1 for the term;
: - I never delete documents using the 'IndexReader' from which I get the
: 'TermDocs' object;
: - I am only interested in the document IDs.

I don't think it really matters wether you do deletes on the same
IndexReader -- what matters is if there has been any deletes done to the
index prior to opening the reader since it was last optimized.  The reason
being that deleting a document just causes a record of the deletion to be
made, but no Term/DOc mapping information is removed.

So if your index is read only and will never contain any deleted
documents, then your method may be safe (i'm not 100% certain of that,
just guessing) but if it's possible for documents to be deleted from your
index at some point, then i'm 99% sure your approach will result it
indicating matches on documents which are no longer "viable"


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org