You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sébastien Lamy <la...@free.fr> on 2009/09/07 12:37:50 UTC

Faceting optimization

Hi

I'm currently trying to optimize the response time of my solr server.
I found one aberration and hope you may be able to help me solve it:
If, considering the whole document index, there is a lot of possible 
values for a field, asking for facet on that field dramatically increase 
response time. Even if the search returns only one document, with only 
one facet value for that field. This is shown by the three requests at 
the bottom of this mail.

It seems to me that solr looks at all the possible values in the whole 
index for the faceted field. Whereas it should look at the possible 
values only for the documents in the results, wich would be a lot 
faster. Is there a way asking him to do so?


---
Let's look at this three requests:

1- This request returns only one document and take 3ms
http://localhost:8983/solr/select/?
rows=10&
q=(available_owner_display_name_s_facet:%22mag%22)+AND+type_s:[T0+TO+T99999]


2- This request returns one document, and its facets for one field. It 
takes about 1000ms. The facet on a_10_alpha_sort returns only one value: 
"air du temps". But overall the whole index, there is a lot of values 
(>10 000) for a_10_alpha_sort.
http://localhost:8983/solr/select/?
facet=true&
rows=10&
q=(available_owner_display_name_s_facet:%22mag%22)+AND+type_s:[T0+TO+T99999]&
facet.field=a_10_alpha_sort&
f.a_10_alpha_sort.facet.mincount=1&
f.a_10_alpha_sort.facet.sort=true&
f.a_10_alpha_sort.facet.limit=8&

3- This request includes the value "air du temps" in the search string. 
It takes 3ms
http://localhost:8983/solr/select/?
rows=10&
q=(available_owner_display_name_s_facet:%22mag%22+AND+a_10_alpha_sort:"air+du+temps")+AND+type_s:[T0+TO+T99999]


Here is the description of the faceted field in my schema: this is a 
single-valued field, with no tokens.

<dynamicField name="*_alpha_sort" type="alphaOnlySort" indexed="true" 
stored="false" multivalued="false"/>
<fieldType name="alphaOnlySort" class="solr.TextField" 
sortMissingLast="true" omitNorms="true">
  <analyzer>
    <!-- KeywordTokenizer does no actual tokenizing, so the entire
         input string is preserved as a single token -->
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.ISOLatin1AccentFilterFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.TrimFilterFactory" />
  </analyzer>
</fieldType>

Re: Faceting optimization

Posted by Grant Ingersoll <gs...@apache.org>.
If you add &debugQuery=true to your query, it should give you a  
breakdown of time spent in the various components.

Also, are you doing any warming?  You might also look at the new  
faceting method in Solr 1.4.

On Sep 7, 2009, at 6:37 AM, Sébastien Lamy wrote:

> Hi
>
> I'm currently trying to optimize the response time of my solr server.
> I found one aberration and hope you may be able to help me solve it:
> If, considering the whole document index, there is a lot of possible  
> values for a field, asking for facet on that field dramatically  
> increase response time. Even if the search returns only one  
> document, with only one facet value for that field. This is shown by  
> the three requests at the bottom of this mail.
>
> It seems to me that solr looks at all the possible values in the  
> whole index for the faceted field. Whereas it should look at the  
> possible values only for the documents in the results, wich would be  
> a lot faster. Is there a way asking him to do so?
>
>
> ---
> Let's look at this three requests:
>
> 1- This request returns only one document and take 3ms
> http://localhost:8983/solr/select/?
> rows=10&
> q=(available_owner_display_name_s_facet:%22mag%22)+AND+type_s:[T0+TO 
> +T99999]
>
>
> 2- This request returns one document, and its facets for one field.  
> It takes about 1000ms. The facet on a_10_alpha_sort returns only one  
> value: "air du temps". But overall the whole index, there is a lot  
> of values (>10 000) for a_10_alpha_sort.
> http://localhost:8983/solr/select/?
> facet=true&
> rows=10&
> q=(available_owner_display_name_s_facet:%22mag%22)+AND+type_s:[T0+TO 
> +T99999]&
> facet.field=a_10_alpha_sort&
> f.a_10_alpha_sort.facet.mincount=1&
> f.a_10_alpha_sort.facet.sort=true&
> f.a_10_alpha_sort.facet.limit=8&
>
> 3- This request includes the value "air du temps" in the search  
> string. It takes 3ms
> http://localhost:8983/solr/select/?
> rows=10&
> q=(available_owner_display_name_s_facet:%22mag%22+AND 
> +a_10_alpha_sort:"air+du+temps")+AND+type_s:[T0+TO+T99999]
>
>
> Here is the description of the faceted field in my schema: this is a  
> single-valued field, with no tokens.
>
> <dynamicField name="*_alpha_sort" type="alphaOnlySort"  
> indexed="true" stored="false" multivalued="false"/>
> <fieldType name="alphaOnlySort" class="solr.TextField"  
> sortMissingLast="true" omitNorms="true">
> <analyzer>
>   <!-- KeywordTokenizer does no actual tokenizing, so the entire
>        input string is preserved as a single token -->
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
>   <filter class="solr.ISOLatin1AccentFilterFactory" />
>   <filter class="solr.LowerCaseFilterFactory" />
>   <filter class="solr.TrimFilterFactory" />
> </analyzer>
> </fieldType>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search