You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2010/06/02 20:27:58 UTC

Re: minpercentage vs. mincount

: Obviously I could implement this in userland (like like mincount for 
: that matter), but I wonder if anyone else see's use in being able to 
: define that a facet must match a minimum percentage of all documents in 
: the result set, rather than a hardcoded value? The idea being that while 
: I might not be interested in a facet that only covers 3 documents in the 
: result set if there are lets say 1000 documents in the result set, the 
: situation would be a lot different if I only have 10 documents in the 
: result set.

typically people deal with this type of situation by using facet.limit to 
ensure they only get the "top" N constraints back -- and they set 
facet.mincount to something low just to save bandwidth if all the 
counts are "too low to care about no matter how few results there are" 
(ie: 0)

: I did not yet see such a feature, would it make sense to file it as a 
: feature request or should stuff like this rather be done in userland (I 
: have noticed for example that Solr prefers to have users normalize the 
: scores in userland too)?

feel free to file a feature request -- truthfully this is kind of a hard 
problem to solve in userland, you'd either have to do two queries (the 
first to get the numFound, the second with facet.mincount set as an 
integer relative numFound) or you'd have to do a single query but ask for 
a "big" value for facet.limit and hope that you get enough to prune your 
list.

Off the top of my head though: i can't relaly think of a sane way to do 
this on the server side that would work with distributed search either -- 
but go ahead and open an issue and let's see what the folks who are really 
smart about the distributed searching stuff have to say.


-Hoss

Re: minpercentage vs. mincount

Posted by Lukas Kahwe Smith <ml...@pooteeweet.org>.

thx for your reply!

On 02.06.2010, at 20:27, Chris Hostetter wrote:

> feel free to file a feature request -- truthfully this is kind of a hard 
> problem to solve in userland, you'd either have to do two queries (the 
> first to get the numFound, the second with facet.mincount set as an 
> integer relative numFound) or you'd have to do a single query but ask for 
> a "big" value for facet.limit and hope that you get enough to prune your 
> list.

well i would probably implement it by just not setting a limit, and then just reducing the facets based on the numRows before sending the facets to the client (aka browser)

> Off the top of my head though: i can't relaly think of a sane way to do 
> this on the server side that would work with distributed search either -- 
> but go ahead and open an issue and let's see what the folks who are really 
> smart about the distributed searching stuff have to say.


ok i have created it:
https://issues.apache.org/jira/browse/SOLR-1937

regards,
Lukas Kahwe Smith
mls@pooteeweet.org