You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paolo Castagna <ca...@googlemail.com> on 2010/10/11 17:30:26 UTC

facet.method: enum vs. fc

Hi,
I am using Solr v1.4 and I am not sure which facet.method I should use.

What should I use if I do not know in advance if the number of values
for a given field will be high or low?

What are the pros/cons of using facet.method=enum vs. facet.method=fc?

When should I use enum vs. fc?

I have found some comments and suggestions here:

  "enum enumerates all terms in a field, calculating the set intersection
   of documents that match the term with documents that match the query.
   This was the default (and only) method for faceting multi-valued fields
   prior to Solr 1.4.
  "fc (stands for field cache), the facet counts are calculated by
   iterating over documents that match the query and summing the terms
   that appear in each document. This was the default method for single
   valued fields prior to Solr 1.4.
   The default value is fc (except for BoolField) since it tends to use
   less memory and is faster when a field has many unique terms in the
   index."
   -- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

  "facet.method=enum [...] this is excellent for fields where there is
   a small set of distinct values. The average number of values per
   document does not matter.
   facet.method=fc [...] this is excellent for situations where the
   number of indexed values for the field is high, but the number of
   values per document is low. For multi-valued fields, a hybrid approach
   is used that uses term filters from the filterCache for terms that
   match many documents."
   -- http://wiki.apache.org/solr/SolrFacetingOverview

  "If you are faceting on a field that you know only has a small number
   of values (say less than 50), then it is advisable to explicitly set
   this to enum. When faceting on multiple fields, remember to set this
   for the specific fields desired and not universally for all facets.
   The request handler configuration is a good place to put this."
   -- Book: "Solr 1.4 Enterprise Search Server", pag. 148

This is the part of the Solr code which deals with the facet.method
parameter:

   if (enumMethod) {
     counts = getFacetTermEnumCounts([...]);
   } else {
     if (multiToken) {
       UnInvertedField uif = [...]
       counts = uif.getCounts([...]);
     } else {
       [...]
       if (per_segment) {
         [...]
         counts = ps.getFacetCounts([...]);
       } else {
         counts = getFieldCacheCounts([...]);
       }
     }
   }
   -- 
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java

See also:

  - 
http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values

At the end, since I do not know in advance the number of different
values for my fields I went for facet.method=fc, does this seems
reasonable to you?

Thank you,
Paolo

Re: facet.method: enum vs. fc

Posted by Paolo Castagna <ca...@googlemail.com>.
Thank you Erick, your explanation was helpful.
I'll stick with fc and come back to this later if I need further tuning.

Paolo

Erick Erickson wrote:
> Yep, that was probably the best choice....
> 
> It's a classic time/space tradeoff. The enum method creates a bitset for
> #each#
> unique facet value. The bit set is (maxdocs / 8) bytes in size (I'm ignoring
> some overhead here). So if your facet field has 10 unique values, and 8M
> documents,
> you'll use up 10M bytes or so. 20 unique values will use up 20M bytes and so
> on. But
> this is very, very fast.
> 
> fc on the other hand, eats up cache for storing the string value for each
> unique value,
> plus various counter arrays (several bytes/doc). For most cases, it will use
> less memory
> than enum, but will be slower.
> 
> I'd stick with fc for the time being and think about enum if 1> you have a
> good idea of
> what the number of unique terms is or 2> you start to need to finely tune
> your speed.
> 
> HTH
> Erick
> 
> On Mon, Oct 11, 2010 at 11:30 AM, Paolo Castagna <
> castagna.lists@googlemail.com> wrote:
> 
>> Hi,
>> I am using Solr v1.4 and I am not sure which facet.method I should use.
>>
>> What should I use if I do not know in advance if the number of values
>> for a given field will be high or low?
>>
>> What are the pros/cons of using facet.method=enum vs. facet.method=fc?
>>
>> When should I use enum vs. fc?
>>
>> I have found some comments and suggestions here:
>>
>>  "enum enumerates all terms in a field, calculating the set intersection
>>  of documents that match the term with documents that match the query.
>>  This was the default (and only) method for faceting multi-valued fields
>>  prior to Solr 1.4.
>>  "fc (stands for field cache), the facet counts are calculated by
>>  iterating over documents that match the query and summing the terms
>>  that appear in each document. This was the default method for single
>>  valued fields prior to Solr 1.4.
>>  The default value is fc (except for BoolField) since it tends to use
>>  less memory and is faster when a field has many unique terms in the
>>  index."
>>  -- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
>>
>>  "facet.method=enum [...] this is excellent for fields where there is
>>  a small set of distinct values. The average number of values per
>>  document does not matter.
>>  facet.method=fc [...] this is excellent for situations where the
>>  number of indexed values for the field is high, but the number of
>>  values per document is low. For multi-valued fields, a hybrid approach
>>  is used that uses term filters from the filterCache for terms that
>>  match many documents."
>>  -- http://wiki.apache.org/solr/SolrFacetingOverview
>>
>>  "If you are faceting on a field that you know only has a small number
>>  of values (say less than 50), then it is advisable to explicitly set
>>  this to enum. When faceting on multiple fields, remember to set this
>>  for the specific fields desired and not universally for all facets.
>>  The request handler configuration is a good place to put this."
>>  -- Book: "Solr 1.4 Enterprise Search Server", pag. 148
>>
>> This is the part of the Solr code which deals with the facet.method
>> parameter:
>>
>>  if (enumMethod) {
>>    counts = getFacetTermEnumCounts([...]);
>>  } else {
>>    if (multiToken) {
>>      UnInvertedField uif = [...]
>>      counts = uif.getCounts([...]);
>>    } else {
>>      [...]
>>      if (per_segment) {
>>        [...]
>>        counts = ps.getFacetCounts([...]);
>>      } else {
>>        counts = getFieldCacheCounts([...]);
>>      }
>>    }
>>  }
>>  --
>> https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java
>>
>> See also:
>>
>>  -
>> http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values
>>
>> At the end, since I do not know in advance the number of different
>> values for my fields I went for facet.method=fc, does this seems
>> reasonable to you?
>>
>> Thank you,
>> Paolo
>>
> 

Re: facet.method: enum vs. fc

Posted by Erick Erickson <er...@gmail.com>.
Yep, that was probably the best choice....

It's a classic time/space tradeoff. The enum method creates a bitset for
#each#
unique facet value. The bit set is (maxdocs / 8) bytes in size (I'm ignoring
some overhead here). So if your facet field has 10 unique values, and 8M
documents,
you'll use up 10M bytes or so. 20 unique values will use up 20M bytes and so
on. But
this is very, very fast.

fc on the other hand, eats up cache for storing the string value for each
unique value,
plus various counter arrays (several bytes/doc). For most cases, it will use
less memory
than enum, but will be slower.

I'd stick with fc for the time being and think about enum if 1> you have a
good idea of
what the number of unique terms is or 2> you start to need to finely tune
your speed.

HTH
Erick

On Mon, Oct 11, 2010 at 11:30 AM, Paolo Castagna <
castagna.lists@googlemail.com> wrote:

> Hi,
> I am using Solr v1.4 and I am not sure which facet.method I should use.
>
> What should I use if I do not know in advance if the number of values
> for a given field will be high or low?
>
> What are the pros/cons of using facet.method=enum vs. facet.method=fc?
>
> When should I use enum vs. fc?
>
> I have found some comments and suggestions here:
>
>  "enum enumerates all terms in a field, calculating the set intersection
>  of documents that match the term with documents that match the query.
>  This was the default (and only) method for faceting multi-valued fields
>  prior to Solr 1.4.
>  "fc (stands for field cache), the facet counts are calculated by
>  iterating over documents that match the query and summing the terms
>  that appear in each document. This was the default method for single
>  valued fields prior to Solr 1.4.
>  The default value is fc (except for BoolField) since it tends to use
>  less memory and is faster when a field has many unique terms in the
>  index."
>  -- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
>
>  "facet.method=enum [...] this is excellent for fields where there is
>  a small set of distinct values. The average number of values per
>  document does not matter.
>  facet.method=fc [...] this is excellent for situations where the
>  number of indexed values for the field is high, but the number of
>  values per document is low. For multi-valued fields, a hybrid approach
>  is used that uses term filters from the filterCache for terms that
>  match many documents."
>  -- http://wiki.apache.org/solr/SolrFacetingOverview
>
>  "If you are faceting on a field that you know only has a small number
>  of values (say less than 50), then it is advisable to explicitly set
>  this to enum. When faceting on multiple fields, remember to set this
>  for the specific fields desired and not universally for all facets.
>  The request handler configuration is a good place to put this."
>  -- Book: "Solr 1.4 Enterprise Search Server", pag. 148
>
> This is the part of the Solr code which deals with the facet.method
> parameter:
>
>  if (enumMethod) {
>    counts = getFacetTermEnumCounts([...]);
>  } else {
>    if (multiToken) {
>      UnInvertedField uif = [...]
>      counts = uif.getCounts([...]);
>    } else {
>      [...]
>      if (per_segment) {
>        [...]
>        counts = ps.getFacetCounts([...]);
>      } else {
>        counts = getFieldCacheCounts([...]);
>      }
>    }
>  }
>  --
> https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java
>
> See also:
>
>  -
> http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values
>
> At the end, since I do not know in advance the number of different
> values for my fields I went for facet.method=fc, does this seems
> reasonable to you?
>
> Thank you,
> Paolo
>