You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Robert Stewart <Ro...@INFONGEN.COM> on 2008/07/25 13:36:04 UTC

caching fields for query performance

If I have a frequently queried field, which has a single value per document (such as language), how can I pre-cache all field values, such that the underlying query processing always uses memory cache (never disk i/o) for that particular field?  I don't know if it is possible without some custom query processing, which may be difficult in my case.

We have incoming ad-hoc Boolean queries, for instance:

+topic:m&a +topic:earn  +company:MSFT +language:(ENG FRA RUS)


We have translated contents per-language, so frequently, the language clause may contain up to 30 language field values.  The performance differences between:

+topic:m&a +topic:earn  +company:MSFT +language:(ENG FRA RUS)

And

+topic:m&a +topic:earn  +company:MSFT

Is very significant (mostly since language is OR clause).

So is there some way to optimize query processing on a per-field basis, such that language clauses are processed more efficiently?

Thanks
Bob

Re: caching fields for query performance

Posted by Yonik Seeley <yo...@apache.org>.
Yes, pull out language:ENG language:FRA language:RUS into filters,
cache them, and re-use them across all queries.

In Lucene, see CachingWrapperFilter()
In Solr, use a separate fq (for filter query) parameter...
  q=+topic:m&a +topic:earn  +company:MSFT&fq=language:ENG

-Yonik

On Fri, Jul 25, 2008 at 7:36 AM, Robert Stewart
<Ro...@infongen.com> wrote:
> If I have a frequently queried field, which has a single value per document (such as language), how can I pre-cache all field values, such that the underlying query processing always uses memory cache (never disk i/o) for that particular field?  I don't know if it is possible without some custom query processing, which may be difficult in my case.
>
> We have incoming ad-hoc Boolean queries, for instance:
>
> +topic:m&a +topic:earn  +company:MSFT +language:(ENG FRA RUS)
>
>
> We have translated contents per-language, so frequently, the language clause may contain up to 30 language field values.  The performance differences between:
>
> +topic:m&a +topic:earn  +company:MSFT +language:(ENG FRA RUS)
>
> And
>
> +topic:m&a +topic:earn  +company:MSFT
>
> Is very significant (mostly since language is OR clause).
>
> So is there some way to optimize query processing on a per-field basis, such that language clauses are processed more efficiently?
>
> Thanks
> Bob
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org