You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2011/04/30 05:20:33 UTC

Reusing Query instances

Hi,

Is there any reason why one would *not* want to reuse Query instances?

I'm using MemoryIndex with a fixed set of queries and I'm executing them all on 
each new document that comes in.  Because each document needs to have many tens 
of thousands of queries executed against it, I thought I'd just run all queries 
through QueryParser once at the beginning, and then just reuse Query instances 
on each incoming document.  What I've noticed is that my fixed set of queries 
takes longer and longer to execute as time passes (more and more time is spent 
inside memoryIndex.search(....) somewhere).  The problem is not heap/memory - 
there is no crazy GCing and the heap is not full, but the CPU is 100% busy.

I should note that queries I'm dealing with are ugly and big, using lots of 
wildcards, but trailing and prefix ones (and this is Lucene 3.1, so no faster 
Wildcard impl).
I should also emphasize that at this point I only *suspect* that maaaybe the 
gradual slowdown I'm seeing has something to do with the fact that I'm reusing 
Query instances.

Is there any reason why one should not reuse Query instances?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Reusing Query instances

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Otis,

> Is there any reason why one would *not* want to reuse Query instances?

Definitely not!
 
> I'm using MemoryIndex with a fixed set of queries and I'm executing them
all
> on each new document that comes in.  Because each document needs to
> have many tens of thousands of queries executed against it, I thought I'd
just
> run all queries through QueryParser once at the beginning, and then just
> reuse Query instances on each incoming document.  What I've noticed is
that
> my fixed set of queries takes longer and longer to execute as time passes
> (more and more time is spent inside memoryIndex.search(....) somewhere).
> The problem is not heap/memory - there is no crazy GCing and the heap is
> not full, but the CPU is 100% busy.

You should still generate some dumps when its gets slow.

In general, reusing queries is perfectly fine, as the queries itself are
only a hull for the query parameters and factories for new rewritten queries
(if needed) and factories for Weights/Scorers. Of course, you should not
reuse rewritten queries, as they largely depend on the underlying index
(which changes on each request).

> I should note that queries I'm dealing with are ugly and big, using lots
of
> wildcards, but trailing and prefix ones (and this is Lucene 3.1, so no
faster
> Wildcard impl).
> I should also emphasize that at this point I only *suspect* that maaaybe
the
> gradual slowdown I'm seeing has something to do with the fact that I'm
> reusing Query instances.

Did this somehow change with 3.1 or was this the same in 3.0? In fact for
each query execution, a BitSet is allocated per segment, but as you use
MemoryIndex, the BitSet is one slot *g* (so its not an issue). For memory
index, it's more important that the term dictionary / positions is optimized
so PhraseQueries and Wildcard queries can quickly execute on the term index.
As said before, the queries from query parser are only used to rewrite
against, producing index, specific queries. The reuse pattern is ok and
wanted.

Some other question: Can you temporary replace memoryindex by another simple
one-doc impl (RAMDirectory), just to test if it also slows down then? I
don't like MemoryIndex at all (I know, it was not the bad guy for your stack
overflow).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org