You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2006/09/04 01:33:02 UTC

Re: [jira] Created: (SOLR-44) Basic Facet Count support

: : One thing that my facet code does is compute the count for all items
: : that have _no_ terms in a particular field, and makes an
: : <unspecified> count as well.  It does this by putting all documents
: : found into a DocSet as it iterates through all terms for a field, and
: : then .andNot'ing it away from an all docs query.  Not pretty, but
: : does work and works quite fast.

when i went to add this, it occured to me that it was probably just easier
toget the DocSet for "field:[* TO *]" and andNot that with the main set of
matches -- it means conputing one additional (large) DocSet for each field
- but for fields with a lot of terms it should be a lot faster then doing
one andNot per term. ... and i'm 99.9999% sure it's functionaly
equivilent (right?)

(this only works for "facet fields" like this of course ... if/when we
have arbitrarily complex "facets" defined by a set of rules implemented as
queries we'd certainly need the DocSet.andNot approach)


-Hoss


Re: [jira] Created: (SOLR-44) Basic Facet Count support

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 3, 2006, at 7:33 PM, Chris Hostetter wrote:

>
> : : One thing that my facet code does is compute the count for all  
> items
> : : that have _no_ terms in a particular field, and makes an
> : : <unspecified> count as well.  It does this by putting all  
> documents
> : : found into a DocSet as it iterates through all terms for a  
> field, and
> : : then .andNot'ing it away from an all docs query.  Not pretty, but
> : : does work and works quite fast.
>
> when i went to add this, it occured to me that it was probably just  
> easier
> toget the DocSet for "field:[* TO *]" and andNot that with the main  
> set of
> matches -- it means conputing one additional (large) DocSet for  
> each field
> - but for fields with a lot of terms it should be a lot faster then  
> doing
> one andNot per term. ... and i'm 99.9999% sure it's functionaly
> equivilent (right?)

Yeah, I believe it's equivalent.  I just haven't gotten used to the  
[* TO *] option, so overlooked it.  I only do one .andNot per field,  
though, not per term.  Though I do a .union per term.  It is bound to  
be quicker with your approach.  I look forward to deprecating my work  
for this :)

> (this only works for "facet fields" like this of course ... if/when we
> have arbitrarily complex "facets" defined by a set of rules  
> implemented as
> queries we'd certainly need the DocSet.andNot approach)

Good point.

	Erik