You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Aigner, Thomas" <TA...@WescoDist.com> on 2005/10/05 15:05:19 UTC

Optimization

Howdy all,

	Have a question.. Is there any obvious things that can be done
to help speed up query lookups especially wildcard searches (i.e.
*lamps).  
	We have created a server application on a linux box that listens
to a socket and processes searches as they come in.  We thought that you
should only have one Index Searcher instantiated at a time but this
caused some result set issues so we create a new Index Searcher each
time a query comes in.  (Should this be the case?)  Also, when we scale
up our stress test to 8+ users at the same time, we are seeing large
latency issues from when we call the search to getting results
(eventhough the search lookup time itself is normally very fast except
wildcard lookups).  I am thinking that perhaps there is a queuing method
on the search waiting for resources to do the actual lookup?  

	If any of you have run into a problem akin to this, please don't
hesitate reply with ideas and set me straight as to what I am doing
wrong.

We have played with java heap size but it seems that the memory size of
the java pool does not seem to help with speed too much, just with
looking up with the too many clauses exceptions.

Hope this is not too vague..
Thanks all ahead of time,
Tom


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Optimization

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Oct 5, 2005, at 9:05 AM, Aigner, Thomas wrote:
>     Have a question.. Is there any obvious things that can be done
> to help speed up query lookups especially wildcard searches (i.e.
> *lamps).

Obvious?  Sort of.  *lamps needs to scan through _every_ single term  
in the index (for the specified field only, of course) because terms  
are lexicographically ordered.

If you reverse terms during analysis and lay them in the same  
position (increment 0) as the original token you'd end up with  
"spmal..." terms.  Now pre-process the query string and if there is a  
prefixed wildcard query, reverse it so that "*lamps" turns into  
"spmal*" and you will likely achieve a dramatic speed-up.

This is just one technique for dealing with prefixed wildcard  
queries.  There is more fun to be had with queries like *lamps*.  A  
technique I learned from the book Managing Gigabytes is to rotate  
terms through all their possible variations and index all of those,  
which also requires cleverness on the querying side of things.

     Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org