You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2006/12/28 01:59:10 UTC

[Solr Wiki] Trivial Update of "SolrFacetingOverview" by JJLarrea

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by JJLarrea:
http://wiki.apache.org/solr/SolrFacetingOverview

The comment on the change is:
Removed CategoryCategory

------------------------------------------------------------------------------

Any number of [:SimpleFacetParameters#facet.field:facet.field] parameters can be passed to the request handler. For each facet.field, one of two approaches will be used:

- * Field Queries: If the facet field is defined in the schema as multi-valued, boolean, or tokenized, then every indexed value for the field will be iterated and a facet query will be executed and cached (as described above). This is excellent for fields where there is a small set of distinct values. For example, faceting on a field with U.S. States eg. `Alabama, Alaska, ... Wyoming` would lead to fifty cached queries which would be used over and over again. It also works in the case when the facet field can have multiple values for each document. However, it requires excessive amounts of memory and time when the number of field values is large and especially when it exceeds the filter cache size defined in [:SolrCaching#filterCache:filterCache]
+ * '''Field Queries''': If the facet field is defined in the schema as multi-valued, boolean, or tokenized, then every indexed value for the field will be iterated and a facet query will be executed and cached (as described above). This is excellent for fields where there is a small set of distinct values. For example, faceting on a field with U.S. States eg. `Alabama, Alaska, ... Wyoming` would lead to fifty cached queries which would be used over and over again. It also works in the case when the facet field can have multiple values for each document. However, it requires excessive amounts of memory and time when the number of field values is large and especially when it exceeds the filter cache size defined in [:SolrCaching#filterCache:filterCache]

- * Field Cache: If the facet field is not tokenized, not multi-valued, and not boolean, then a field-cache approach will be used. This is currently implemented with the Lucene [http://lucene.apache.org/java/docs/api/org/apache/lucene/search/FieldCache.html FieldCache] mechanism used for results sorting. An array of integers (one for every document in the index) is allocated, pre-filled with the first indexed value for that field in each document (offset into a table of strings for fields indexed as strings), and cached. Every time that facet.field is used for faceting a query, all the document IDs resulting from the query are looked up in the field cache and any value found has its tally incremented. This is excellent for situations where the number of indexed values for the field is too large to be practical using the field queries mechanism, such as faceting against authors or titles. However it is currently much slower and more memory-intensive than the field que
ry mechanism for fields with a small number of values.
+ * '''Field Cache''': If the facet field is not tokenized, not multi-valued, and not boolean, then a field-cache approach will be used. This is currently implemented with the Lucene [http://lucene.apache.org/java/docs/api/org/apache/lucene/search/FieldCache.html FieldCache] mechanism used for results sorting. An array of integers (one for every document in the index) is allocated, pre-filled with the first indexed value for that field in each document (offset into a table of strings for fields indexed as strings), and cached. Every time that facet.field is used for faceting a query, all the document IDs resulting from the query are looked up in the field cache and any value found has its tally incremented. This is excellent for situations where the number of indexed values for the field is too large to be practical using the field queries mechanism, such as faceting against authors or titles. However it is currently much slower and more memory-intensive than the fie
ld query mechanism for fields with a small number of values.

Note at this time there is no way to manually control whether facet.field is handled via field queries or field cache other than defining in the schema whether the field is single- or multi-valued and the analyzer used: `solr.TextField` is always tokenized while `solr.StrField` is never. Control may be improved in the future, along with a means to handle multi-valued fields with a variant of the Field Cache mechanism.
- ----
- CategoryCategory