You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2014/10/03 21:42:19 UTC

Question about filter cache size

Say I have a boolean field named 'hidden', and less than 1% of the
documents in the index have hidden=true.
Do both these filter queries use the same docset cache size? :
fq=hidden:false
fq=!hidden:true

Peter

Re: Question about filter cache size

Posted by Yonik Seeley <yo...@heliosearch.com>.
On Fri, Oct 3, 2014 at 6:38 PM, Peter Keegan <pe...@gmail.com> wrote:
>> it will be cached as hidden:true and then inverted
> Inverted at query time, so for best query performance use fq=hidden:false,
> right?

Yep.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

Re: Question about filter cache size

Posted by Peter Keegan <pe...@gmail.com>.
> it will be cached as hidden:true and then inverted
Inverted at query time, so for best query performance use fq=hidden:false,
right?

On Fri, Oct 3, 2014 at 3:57 PM, Yonik Seeley <yo...@heliosearch.com> wrote:

> On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan <pe...@gmail.com>
> wrote:
> > Say I have a boolean field named 'hidden', and less than 1% of the
> > documents in the index have hidden=true.
> > Do both these filter queries use the same docset cache size? :
> > fq=hidden:false
> > fq=!hidden:true
>
> Nope... !hidden:true will be smaller in the cache (it will be cached
> as hidden:true and then inverted)
> The downside is that you'll pay the cost of that inversion.
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>

Re: Question about filter cache size

Posted by Yonik Seeley <yo...@heliosearch.com>.
On Fri, Oct 3, 2014 at 4:35 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 10/3/2014 1:57 PM, Yonik Seeley wrote:
>> On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan <pe...@gmail.com> wrote:
>>> Say I have a boolean field named 'hidden', and less than 1% of the
>>> documents in the index have hidden=true.
>>> Do both these filter queries use the same docset cache size? :
>>> fq=hidden:false
>>> fq=!hidden:true
>>
>> Nope... !hidden:true will be smaller in the cache (it will be cached
>> as hidden:true and then inverted)
>> The downside is that you'll pay the cost of that inversion.
>
> I would think that unless it's using hashDocSet, the cached data for
> every filter would always be the same size.  The wiki says that
> hashDocSet is no longer used for filter caching as of 1.4.0.  Is that
> actually true?

Yes, SortedIntDocSet is used instead.  It stores an int per match
(i.e. 4 bytes per match).  This change was made so in-order traversal
could be done efficiently.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

Re: Question about filter cache size

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/3/2014 1:57 PM, Yonik Seeley wrote:
> On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan <pe...@gmail.com> wrote:
>> Say I have a boolean field named 'hidden', and less than 1% of the
>> documents in the index have hidden=true.
>> Do both these filter queries use the same docset cache size? :
>> fq=hidden:false
>> fq=!hidden:true
> 
> Nope... !hidden:true will be smaller in the cache (it will be cached
> as hidden:true and then inverted)
> The downside is that you'll pay the cost of that inversion.

I would think that unless it's using hashDocSet, the cached data for
every filter would always be the same size.  The wiki says that
hashDocSet is no longer used for filter caching as of 1.4.0.  Is that
actually true?  Is my understanding of filterCache completely out of
touch with reality?

https://wiki.apache.org/solr/SolrCaching#The_hashDocSet_Max_Size

This does bring to mind an optimization that might help memory usage in
cases where either a very small or very large percentage of documents
match the filter: do run-length encoding on the bitset.  If the RLE
representation is at least N percent smaller than the bitset, use that
representation instead.

I think the first iteration of an RLE option would have it always on or
always off, controlled in solrconfig.xml.  A config mode where Solr
attempts RLE on every bitset and periodically reports efficiency
statistics would be pretty nice.  That data might be useful to define
default thresholds for a future automatic mode.

Thanks,
Shawn


Re: Question about filter cache size

Posted by Yonik Seeley <yo...@heliosearch.com>.
On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan <pe...@gmail.com> wrote:
> Say I have a boolean field named 'hidden', and less than 1% of the
> documents in the index have hidden=true.
> Do both these filter queries use the same docset cache size? :
> fq=hidden:false
> fq=!hidden:true

Nope... !hidden:true will be smaller in the cache (it will be cached
as hidden:true and then inverted)
The downside is that you'll pay the cost of that inversion.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data