You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tomislav Poljak <tp...@gmail.com> on 2010/05/07 20:34:46 UTC

Filter vs. TermQuery performance

Hi,
when is it wise to replace a TermQuery with cached Filter (regarding
search performance). If TermQuery is used only to filter results based
on field value (it doesn't participate in scoring), is it alway wise to
replace it with filter? Is it only wise if Filter is cached (wrapped in
CachingWrapperFilter) and reused often?


Does it matter how many distinct values field has (which is related to
how many matches/results for one given/selected value is returned and
also with how many times same filter instance is reused)?

For example, what if filter for single value matches only 5% of docs,
should filter be used or is it better to use TermQuery? 

What about if filter for single value matches 20%? or 50% or 75%

I have a question regarding caching performance/memory usage. Documents
have date&time indexed (as NumericField) with minute resolution and
there are few thousands unique date&time in index. On the search side
open ended range filter is used (NumericRangeFilter) with current time
as a parameter.

Now, is it wise to cache NumericRangeFilter here (reuse instance of
CachingWrapperFilter wrapping NumericRangeFilter) since it will not be
reused often (only from users searching at same time in same time zone)?

Is it better to use NumericRangeFilter or NumericRangeQuery in this
case?

Tomislav








---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Filter vs. TermQuery performance

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I think others will have more thoughts on this, esp. for Numeric* questions... but I'll try answering...
 

----- Original Message ----
> From: Tomislav Poljak <tp...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Fri, May 7, 2010 2:34:46 PM
> Subject: Filter vs. TermQuery performance
> 
> Hi,
> when is it wise to replace a TermQuery with cached Filter 
> (regarding search performance). If TermQuery is used only to filter results 
> based on field value (it doesn't participate in scoring), is it alway wise 
> to replace it with filter? 

Yes, assuming the filter will be reused.  I think there is not a lot of value in using a filter (vs. just a regular query) if that filter will not be reused.  This is why in Solr "fq"s (filtered queries) are cached in a special filter cache.  I *think* the only other benefit of using a filter query vs., say, TermQuery, is that the former will not spend any time/CPU on computing the score for the filter part.

> Is it only wise if Filter is cached (wrapped in CachingWrapperFilter) and reused often?

I think so.  See above.

> Does it matter how many 
> distinct values field has (which is related to how many matches/results for 
> one given/selected value is returned and also with how many times same filter 
> instance is reused)?

I *think* it matters.  I think the more docs a filter matches, the higher the benefit from reusing a filter.

> For example, what if filter for single value matches 
> only 5% of docs, should filter be used or is it better to use TermQuery? 
> What about if filter for single value matches 20%? or 50% or 
> 75%

I'm not sure...

> I have a question regarding caching performance/memory usage. 
> Documents have date&time indexed (as NumericField) with minute resolution 
> and there are few thousands unique date&time in index. On the search 
> side open ended range filter is used (NumericRangeFilter) with current 
> time as a parameter.

> Now, is it wise to cache NumericRangeFilter here 
> (reuse instance of CachingWrapperFilter wrapping NumericRangeFilter) since it 
> will not be reused often (only from users searching at same time in same time 
> zone)?

If the cache hit rate is low, why waste memory on caching is what I would think is the logic to apply here.
If you have 3 queries, and each uses a different date range query, then you will not see benefits from caching..
If 2 of those 3 queries use the exact same date range query, then you will see caching benefits.

> Is it better to use NumericRangeFilter or NumericRangeQuery in this case?

I'm not sure, but I'd be happy to add specific advice to Javadoc when the answer is clear.

Otis

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org