You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Laird <an...@gettyimages.com> on 2012/06/07 10:01:37 UTC
How to cap facet counts beyond a specified limit
We have an index with ~100M documents and I am looking for a simple way to speed up faceted searches. Is there a relatively straightforward way to stop counting the number of matching documents beyond some specifiable value? For our needs we don't really need to know that a particular facet has exactly 14,203,527 matches - just knowing that there are "more than a million" is enough. If I could somehow limit the hit counts to a million (say) it seems like that could decrease the work required to compute the values (just stop counting after the limit is reached) and potentially improve faceted search time - especially when we have 20-30 fields to facet on. Has anyone else tried to do something like this?
Many thanks for comments and info,
Sincerely,
andy laird | gettyimages | 206.925.6728
Re: How to cap facet counts beyond a specified limit
Posted by Jack Krupansky <ja...@basetechnology.com>.
Sounds like an interesting improvement to propose.
It will also depend on various factors, such as number of unique terms in a
field, field type, etc.
Which field types are giving you the most trouble and how many unique values
do they have? And do you specify a facet.method or just let it default?
What release of Solr are you on? Are you using "trie" for numeric fields?
Are these mostly string fields? Any boolean fields?
-- Jack Krupansky
-----Original Message-----
From: Andrew Laird
Sent: Thursday, June 07, 2012 4:01 AM
To: solr-user@lucene.apache.org
Subject: How to cap facet counts beyond a specified limit
We have an index with ~100M documents and I am looking for a simple way to
speed up faceted searches. Is there a relatively straightforward way to
stop counting the number of matching documents beyond some specifiable
value? For our needs we don't really need to know that a particular facet
has exactly 14,203,527 matches - just knowing that there are "more than a
million" is enough. If I could somehow limit the hit counts to a million
(say) it seems like that could decrease the work required to compute the
values (just stop counting after the limit is reached) and potentially
improve faceted search time - especially when we have 20-30 fields to facet
on. Has anyone else tried to do something like this?
Many thanks for comments and info,
Sincerely,
andy laird | gettyimages | 206.925.6728
Re: How to cap facet counts beyond a specified limit
Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2012-06-07 at 10:01 +0200, Andrew Laird wrote:
> For our needs we don't really need to know that a particular facet has
> exactly 14,203,527 matches - just knowing that there are "more than a
> million" is enough. If I could somehow limit the hit counts to a
> million (say) [...]
It should be feasible to stop the collector after 1M documents has been
processed. If nothing else then just by ignoring subsequent IDs.
However, the ID's received would be in index-order, which normally means
old-to-new. If the nature of the corpus, and thereby the facet values,
changes over time, this change would not be reflected in the facets that
has many hits as the collector never reaches the newer documents.
> it seems like that could decrease the work required to
> compute the values (just stop counting after the limit is reached) and
> potentially improve faceted search time - especially when we have 20-30
> fields to facet on. Has anyone else tried to do something like this?
The current Solr facet implementation treats every facet structure
individually. It works fine in a lot of areas but it also means that the
list of IDs for matching documents is iterated once for every facet: In
the sample case, 14M+ hits * 25 fields = 350M+ hits processed.
I have been experimenting with an alternative approach (SOLR-2412) that
packs the terms in the facets as a single structure underneath the hood,
which means only 14M+ hits processed in the current case. Unfortunately
it is not mature and only works for text fields.
- Toke Eskildsen, State and University Library, Denmark