You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Joe Paulsen <jo...@verizon.net> on 2004/03/31 01:24:06 UTC
Near performance question
Based on the nature of our documents, we sometimes
experience extremely long response times when executing
NEAR operations against a document (sometimes well over
minutes - even though the operation is restricted
to a single document).
Our analysis of the code indicates (we think):
It looks up each of the terms in the word.dbx file.
It intersects the occurrence lists. (So far so good!)
It takes each gid found in the occurrence list and:
finds its parent right up until the root of the document (in dom.dbx).
Traverses the tree depth-first until it finds the node text of interest.
Does the expected scan to find out
if the term distance requirement is satisfied.
We did some timings on our document (Rusticus).
It started off taking < 1 second per occ and grew to 25 seconds.
If we changed the dom.dbx buffers, we got significant
improvement, but still relatively slow (343 occs).
QUESTION:
Seems to us the occs are ordered by gid
(and we don't do any updating). Is there
a simple way to make use of the positioning
information of the tree levels for the prior
occurrence on the current occurrence so that
we don't have to start again from the
document root?
Thanks,
Joe Paulsen
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org