You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike Klaas <mi...@gmail.com> on 2008/02/01 01:18:49 UTC

Re: Slow response times using *:*

On 31-Jan-08, at 9:41 AM, Andy Blower wrote:

> Yonik Seeley wrote:
>>
>>> This surprises me because the filter query submitted has usually  
>>> already
>>> been submitted along with a normal query, and so should be cached  
>>> in the
>>> filter cache. Surely all solr needs to do is return a handful of  
>>> fields
>>> for
>>> the first 100 records in the list from the cache - or so I thought.
>>
>> To calculate the DocSet (the set of all documents matching *:* and
>> your filters), Solr can just use it's caches as long as *:* and the
>> filters have been used before.
>>
>> *But*, to retrieve the top 10 documents matching *:* and your  
>> filters,
>> the query must be re-run.  That is probably where the time is being
>> spent.  Since you aren't looking for relevancy scores at all, but  
>> just
>> faceting, it seems like we could potentially optimize this in Solr.
>>
>
> I'm actually retrieving the first 100 in my tests, which will be  
> necessary
> in one of the two scenarios we use blank queries for. The other  
> scenario
> doesn't require any docs at all - just the facets, and I've not put  
> that in
> my tests. What would the situation be if I specified a sort order  
> for the
> facets and/or retrieved no docs at all? I'd be sorting the facets
> alphabetically, which is currently done by my app rather than the  
> search
> engine. (since I sometimes have to merge facets from more than one  
> field)

First question:  What is the use of retrieving 100 documents if there  
is no defined sort order?

The situation could be optimized in Solr, but there is a related case  
that _is_ optimized that should be almost as fast.  If you

a) don't ask for document score in field list (fl)
b) enable <useFilterForSortedQuery> in solrconfig.xml
c) specify _some_ sort order other than score

Then Solr will do cached bitset intersections only.  It will also do  
sorting, but that may not be terribly expensive.  If it is close to  
the desired performance, it would be relatively easy to patch solr to  
not do that step.

(Note: this is query sort, no facet sort).

> I had assumed that no doc would be considered more relevant than  
> any other
> without any query terms - i.e. filter query terms wouldn't affect  
> relevance.
> This seems sensible to me, but maybe that's only because our  
> current search
> engine works that way.

It won't, but it will still try to calculate the score if you ask it  
to (all docs will score the same, though).

> Regarding optimization, I certainly think that being able to access  
> all
> facets for subsets of the indexed data (defined by the filter  
> query) is an
> incredibly useful feature. My search engine usage may not be very  
> common
> though. What it means to us is that we can drive all aspects of our  
> sites
> from the search engine, not just the obvious search forms.

I also use this feature.  It would be useful to optimize the case  
where rows=0.

-Mike