You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Greg Gershman <gr...@yahoo.com> on 2005/09/29 19:55:39 UTC

Revisting FieldCacheImpl

Our search engine updates frequently, adding and
removing documents from the index.  After an index
update, we create a new Searcher in the background,
and execute a search against it to "prime" the sorting
by fields.  The new Searcher is swapped for the old. 
>From my understanding, this is a fairly common
practice.

What I'm looking to optimize is the loading of the new
Searcher.  Since we've likely added documents,
possibly deleted a few, we have to create a new
sorting.  Those values are read out of the index and
sorted into an array that is cached.  I believe this
process could be sped up, since many of the values
that we are reading off disk are actually already
loaded into the old Searchers sorting.  The problem
is, the FieldCacheImpl is keyed by IndexReader, then
field name/type, so the new Searcher can not access
the old Searcher's cache.

I'm playing around with making the caching work at the
field name/type level, and getting rid of cacheing by
Reader.  What this would mean is that all searchers
would use the same sorting; under certain
circumstances, a new sorting could be created using
data from an old, cached sorting, which might be
detected based on changes to the IndexReader used, but
wouldn't be keyed off of it.

Does anyone see any potential pitfalls that I'm
overlooking?  We generally use a single Searcher,
would there be problems if we used more than one? 
Anything else?  Is there some other requirement for
caching by IndexReader that I'm overlooking?

Thanks in advance,

Greg


		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Revisting FieldCacheImpl

Posted by Chris Hostetter <ho...@fucit.org>.

: I'm playing around with making the caching work at the
: field name/type level, and getting rid of cacheing by
: Reader.  What this would mean is that all searchers
: would use the same sorting; under certain
: circumstances, a new sorting could be created using
: data from an old, cached sorting, which might be
: detected based on changes to the IndexReader used, but
: wouldn't be keyed off of it.

Can you elaborate on how you would do this? ... how would you know when
opening a new searcher/reader wether or not the FieldCache for aparticular
field needed to be updated without walking the list of terms for that
field?

I suppose there are some very limited cases where you can know that all of
the existing data is still valid (ie: if there were no docs in the old
reader marked deleted, and no docs in the new reader marked deleted) and
you can just "append" the data for any newly added docs to the end -- but
I think finding what that data is still requires iteration over the entire
TermEnum/TermDocs for that field.


The one simple change i can think of that might be possible is adding a
good IndexReader.hashCode method so that different instances of an
IndexReader which still refrence the exact same data won't require a
completely seperate copy of the FieldCache arrays -- possibly something
based on the Directory.hashCode and the IndexReader.getVersion -- assuming
those methods work the way i suspect they do.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org