You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Paul Elschot <pa...@xs4all.nl> on 2005/01/06 15:04:28 UTC

Re: Span Query Performance

A bit more:

On Thursday 06 January 2005 10:22, Paul Elschot wrote:
> On Thursday 06 January 2005 02:17, Andrew Cunningham wrote:
> > Hi all,
> > 
> > I'm currently doing a query similar to the following:
> > 
> > for w in wordset:
> >     query = w near (word1 V word2 V word3 ... V word1422);
> >     perform query
> > 
> > and I am doing this through SpanQuery.getSpans(), iterating through the 
> > spans and counting
> > the matches, which can result in 4782282 matches (essentially I am only 
> > after the match count).
> > The query works but the performance can be somewhat slow; so I am 
wondering:
> > 
...
> > c) Is there a faster method to what I am doing I should consider?
> 
> Preindexing all word combinations that you're interested in.
> 

In case you know all the words in advance, you could also index a
helper word at the same position as each of those words.
This requires a custom analyzer that inserts the helper word in the
token stream with a zero position increment.
The query then simplifies to:
query = w near helperword
which would probably speed things up significantly.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org