You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matteo Grolla <ma...@gmail.com> on 2016/01/05 14:58:33 UTC

enable disable filter query caching based on statistics

Hi,
    after looking at the presentation of cloudsearch from lucene revolution
2014
https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
min 17:08

I recognized I'd love to be able to remove the burden of disabling filter
query caching from developers

the problem:
Solr by default caches filter queries
a) When there are filter queries that are not reused and few that are the
good ones get evicted unnecessarily
b) if the same query has multiple filter queries that are very selective I
noticed a big performance disabling cache
c) I'd like to spare developers from deciding what has to be cached or not

the question:
-Is there anything already working to solve those problems?

what do you think about this?
-I was thinking to write a plugin to recognize query types with regular
exception and let solr admins associate a caching behaviour with each query
type
-another idea was to
   -by default set fq caching off
   -keep statistics about fq
   -enable caching only for the N fq with highest hit ratio

Re: enable disable filter query caching based on statistics

Posted by Alessandro Benedetti <ab...@apache.org>.
I read the client was happy, so I am only curious to know more :)
Apart the readibility, shouldn't be more efficient to put the filters
directly in the main query if you don't cache ?
( checking into the code when not caching is adding a lucene boolean query,
with specifically 0 score, maybe this is an indication that at the current
stage this affirmation is not true anymore.
In the past it was a better approach than having them in separate filters.)
How do you specify a filter to be a postFilter and run only over the query
result cache ?
Of course I don't know if you are excluding filters via tags or have some
other requirements.
I saw you specified gain in rpm, and what about the query time ?
Related the rest of the issue is also in the solr comment in the source
code :

org/apache/solr/search/SolrIndexSearcher.java:1597
...

// now actually use the filter cache.
// for large filters that match few documents, this may be
// slower than simply re-executing the query.
if (out.docSet == null) {
out.docSet = getDocSet(cmd.getQuery(),cmd.getFilter());
DocSet bigFilt = getDocSet(cmd.getFilterList());
if (bigFilt != null) out.docSet = out.docSet.intersection(bigFilt);
}

...

Cheers


Binoy:

bq: In such a case won't applying fqs normally be the same as applying
them as post filters

Certainly not, at least AFAIK...

By definition, regular FQs are calculated over the entire corpus
(not, NOT just the docs that satisfy the query). Then that entire
bitset is stored in the filterCache where it can be reused. Which
is why filterCache entries can be used for different queries.

Also by definition, post filters are _not_ calculated over the
entire corpus, they are only calculated for docs that
1> pass the query criteria
and
2> pass all lower-cost filters
so they will not apply at all to the next query, are not stored in
the filterCache etc.

So I think what Matteo is seeing is that with a restrictive FQ clause,
very few docs have to be tested against most of the FQs.

Matteo:

My guess (and I'm not intimately familiar with the code) is that, indeed
the restrictive clause is helping you a lot here. Frankly I doubt if
adding a cost will make a measurable difference if the most restrictive
FQ clause is quite sparse....

I'm still puzzled in your test scenario why there is such a difference when
making all the filer queries cache=false. _Assuming_ that provincia and type
are relatively low-cardinality fields, they should all be in the
filterCache pretty
quickly But perhaps anding the bitset together is more expensive than the
advantage in this case. I'd be curious as to the hit ratio you were seeing.

But as you say, if the client is satisfied I'm not sure it's worth
pursuing...

Best,
Erick

On Tue, Jan 5, 2016 at 11:09 AM, Matteo Grolla <ma...@gmail.com>
wrote:
> Hi Erik,
>      the test was done on thousands of queries of that kind and milions of
> docs
> I went from <1500 qpm to ~ 6000 qpm on modest virtualized hardware (cpu
> bound and cpu was scarce)
> After that customer happy, time finished and didn't go further but
> definitely cost was something I'd try
> When I saw the presentation of CloudSearch where they explained that they
> were enabling/disabling caching based on fq statistics I thought this kind
> of problem were general enough that I could find a plugin already built
>
> 2016-01-05 19:17 GMT+01:00 Erick Erickson <er...@gmail.com>:
>
>>
>>
&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
>>
>> You have a comma in front of the last fq clause, typo?
>>
>> Well, the whole point of caching filter queries is so that the
>> _second_ time you use it,
>> very little work has to be done. That comes at a cost of course for
>> first-time execution.
>> Basically any fq clause that you can guarantee won't be re-used should
>> have cache=false
>> set.
>>
>> I'd be surprised if the second time you use the provincia and type fq
>> clauses not caching
>> would be faster, but I've been surprised before. I guess anding two
>> bitsets together could
>> take more time than, say, testing a small number of individual
>> documents....
>>
>> And I'm assuming that you're testing multiple queries rather than just
>> one-offs.
>>
>> If you _do_ know that some of your clauses are very restrictive, I
>> wonder what happens if
>> you add a cost in. fq's are evaluated in cost order (when
>> cache=false), so what happens
>> in this case?
>> &fq={!cache=false cost=101}n_rea:xxx&fq={!cache=false
>> cost=102}provincia:yyyy&fq={!cache=false cost=103}type:zzzz
>>
>> Best,
>> Erick
>>
>> On Tue, Jan 5, 2016 at 9:41 AM, Matteo Grolla <ma...@gmail.com>
>> wrote:
>> > Thanks Erik and Binoy,
>> >      This is a case I stumbled upon: with queries like
>> >
>> >
>>
q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
>> >
>> > where n_rea filter is highly selective
>> > I was able to make > 3x performance improvement disabling cache
>> >
>> > I think it's because the last two filters are not so selective, they
are
>> > resolved to two bitset which are then anded together
>> > and this is less efficient than leapfrogging since the first filter has
>> > just one or two results.
>> > Does it make sense to you?
>> >
>> >
>> >
>> >
>> >
>> > 2016-01-05 16:59 GMT+01:00 Erick Erickson <er...@gmail.com>:
>> >
>> >> Matteo:
>> >>
>> >> Let's see if I understand your problem. Essentially you want
>> >> Solr to analyze the filter queries and decide through some
>> >> algorithm which ones to cache. I have a hard time thinking of
>> >> any general way to do this, certainly there's not hing in Solr
>> >> that does this automatically As Binoy mentions there are some
>> >> ways to influence what goes in the cache, but the algorithm is
>> >> simple....
>> >>
>> >> If you build such a thing, I suspect you'll be implicitly building
>> >> in knowledge of how your particular application uses Solr. For
>> >> sure, the functionality around "no cache filters" is there explicitly
>> >> because some fq clauses (think ACL calculations) can be
>> >> very expensive to calculate for the entire corpus (which is what
>> >> fqs do by default).
>> >>
>> >> But you really haven't given us some examples of what sorts
>> >> of fq clauses you consider "bad". Perhaps there are other ways
>> >> of approaching your problem.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >>
>> >> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <bi...@gmail.com>
>> >> wrote:
>> >> > What is your exact requirement then?
>> >> > I ask, because these settings can solve the problems you've
mentioned
>> >> > without the need to add any additional functionality.
>> >> >
>> >> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <
matteo.grolla@gmail.com
>> >
>> >> > wrote:
>> >> >
>> >> >> Hi Binoy,
>> >> >>      I know these settings but the problem I'm trying to solve is
>> when
>> >> >> these settings aren't enough.
>> >> >>
>> >> >>
>> >> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:
>> >> >>
>> >> >> > If I understand your problem correctly, then you don't want the
>> most
>> >> >> > frequently used fqs removed and you do not want your filter cache
>> to
>> >> grow
>> >> >> > to very large sizes.
>> >> >> > Well there is already a solution for both of these.
>> >> >> > In the solrconfig.xml file, you can configure the <filterCache>
>> >> parameter
>> >> >> > to suit your needs.
>> >> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
>> >> >> > b) Set the size to whatever number of fqs you find suitable.
>> >> >> > You can do this like so:
>> >> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
>> >> >> > autoWarmCount="10"/>
>> >> >> > You should play around with these parameters to find the best
>> >> combination
>> >> >> > for your implementation.
>> >> >> > For more details take a look here:
>> >> >> > https://wiki.apache.org/solr/SolrCaching
>> >> >> > http://yonik.com/advanced-filter-caching-in-solr/
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <
>> matteo.grolla@gmail.com
>> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi,
>> >> >> > >     after looking at the presentation of cloudsearch from
lucene
>> >> >> > revolution
>> >> >> > > 2014
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>
https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
>> >> >> > > min 17:08
>> >> >> > >
>> >> >> > > I recognized I'd love to be able to remove the burden of
>> disabling
>> >> >> filter
>> >> >> > > query caching from developers
>> >> >> > >
>> >> >> > > the problem:
>> >> >> > > Solr by default caches filter queries
>> >> >> > > a) When there are filter queries that are not reused and few
that
>> >> are
>> >> >> the
>> >> >> > > good ones get evicted unnecessarily
>> >> >> > > b) if the same query has multiple filter queries that are very
>> >> >> selective
>> >> >> > I
>> >> >> > > noticed a big performance disabling cache
>> >> >> > > c) I'd like to spare developers from deciding what has to be
>> cached
>> >> or
>> >> >> > not
>> >> >> > >
>> >> >> > > the question:
>> >> >> > > -Is there anything already working to solve those problems?
>> >> >> > >
>> >> >> > > what do you think about this?
>> >> >> > > -I was thinking to write a plugin to recognize query types with
>> >> regular
>> >> >> > > exception and let solr admins associate a caching behaviour
with
>> >> each
>> >> >> > query
>> >> >> > > type
>> >> >> > > -another idea was to
>> >> >> > >    -by default set fq caching off
>> >> >> > >    -keep statistics about fq
>> >> >> > >    -enable caching only for the N fq with highest hit ratio
>> >> >> > >
>> >> >> > --
>> >> >> > Regards,
>> >> >> > Binoy Dalal
>> >> >> >
>> >> >>
>> >> > --
>> >> > Regards,
>> >> > Binoy Dalal
>> >>
>>

Re: enable disable filter query caching based on statistics

Posted by Erick Erickson <er...@gmail.com>.
Binoy:

bq: In such a case won't applying fqs normally be the same as applying
them as post filters

Certainly not, at least AFAIK...

By definition, regular FQs are calculated over the entire corpus
(not, NOT just the docs that satisfy the query). Then that entire
bitset is stored in the filterCache where it can be reused. Which
is why filterCache entries can be used for different queries.

Also by definition, post filters are _not_ calculated over the
entire corpus, they are only calculated for docs that
1> pass the query criteria
and
2> pass all lower-cost filters
so they will not apply at all to the next query, are not stored in
the filterCache etc.

So I think what Matteo is seeing is that with a restrictive FQ clause,
very few docs have to be tested against most of the FQs.

Matteo:

My guess (and I'm not intimately familiar with the code) is that, indeed
the restrictive clause is helping you a lot here. Frankly I doubt if
adding a cost will make a measurable difference if the most restrictive
FQ clause is quite sparse....

I'm still puzzled in your test scenario why there is such a difference when
making all the filer queries cache=false. _Assuming_ that provincia and type
are relatively low-cardinality fields, they should all be in the
filterCache pretty
quickly But perhaps anding the bitset together is more expensive than the
advantage in this case. I'd be curious as to the hit ratio you were seeing.

But as you say, if the client is satisfied I'm not sure it's worth pursuing...

Best,
Erick

On Tue, Jan 5, 2016 at 11:09 AM, Matteo Grolla <ma...@gmail.com> wrote:
> Hi Erik,
>      the test was done on thousands of queries of that kind and milions of
> docs
> I went from <1500 qpm to ~ 6000 qpm on modest virtualized hardware (cpu
> bound and cpu was scarce)
> After that customer happy, time finished and didn't go further but
> definitely cost was something I'd try
> When I saw the presentation of CloudSearch where they explained that they
> were enabling/disabling caching based on fq statistics I thought this kind
> of problem were general enough that I could find a plugin already built
>
> 2016-01-05 19:17 GMT+01:00 Erick Erickson <er...@gmail.com>:
>
>>
>> &fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
>>
>> You have a comma in front of the last fq clause, typo?
>>
>> Well, the whole point of caching filter queries is so that the
>> _second_ time you use it,
>> very little work has to be done. That comes at a cost of course for
>> first-time execution.
>> Basically any fq clause that you can guarantee won't be re-used should
>> have cache=false
>> set.
>>
>> I'd be surprised if the second time you use the provincia and type fq
>> clauses not caching
>> would be faster, but I've been surprised before. I guess anding two
>> bitsets together could
>> take more time than, say, testing a small number of individual
>> documents....
>>
>> And I'm assuming that you're testing multiple queries rather than just
>> one-offs.
>>
>> If you _do_ know that some of your clauses are very restrictive, I
>> wonder what happens if
>> you add a cost in. fq's are evaluated in cost order (when
>> cache=false), so what happens
>> in this case?
>> &fq={!cache=false cost=101}n_rea:xxx&fq={!cache=false
>> cost=102}provincia:yyyy&fq={!cache=false cost=103}type:zzzz
>>
>> Best,
>> Erick
>>
>> On Tue, Jan 5, 2016 at 9:41 AM, Matteo Grolla <ma...@gmail.com>
>> wrote:
>> > Thanks Erik and Binoy,
>> >      This is a case I stumbled upon: with queries like
>> >
>> >
>> q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
>> >
>> > where n_rea filter is highly selective
>> > I was able to make > 3x performance improvement disabling cache
>> >
>> > I think it's because the last two filters are not so selective, they are
>> > resolved to two bitset which are then anded together
>> > and this is less efficient than leapfrogging since the first filter has
>> > just one or two results.
>> > Does it make sense to you?
>> >
>> >
>> >
>> >
>> >
>> > 2016-01-05 16:59 GMT+01:00 Erick Erickson <er...@gmail.com>:
>> >
>> >> Matteo:
>> >>
>> >> Let's see if I understand your problem. Essentially you want
>> >> Solr to analyze the filter queries and decide through some
>> >> algorithm which ones to cache. I have a hard time thinking of
>> >> any general way to do this, certainly there's not hing in Solr
>> >> that does this automatically As Binoy mentions there are some
>> >> ways to influence what goes in the cache, but the algorithm is
>> >> simple....
>> >>
>> >> If you build such a thing, I suspect you'll be implicitly building
>> >> in knowledge of how your particular application uses Solr. For
>> >> sure, the functionality around "no cache filters" is there explicitly
>> >> because some fq clauses (think ACL calculations) can be
>> >> very expensive to calculate for the entire corpus (which is what
>> >> fqs do by default).
>> >>
>> >> But you really haven't given us some examples of what sorts
>> >> of fq clauses you consider "bad". Perhaps there are other ways
>> >> of approaching your problem.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >>
>> >> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <bi...@gmail.com>
>> >> wrote:
>> >> > What is your exact requirement then?
>> >> > I ask, because these settings can solve the problems you've mentioned
>> >> > without the need to add any additional functionality.
>> >> >
>> >> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <matteo.grolla@gmail.com
>> >
>> >> > wrote:
>> >> >
>> >> >> Hi Binoy,
>> >> >>      I know these settings but the problem I'm trying to solve is
>> when
>> >> >> these settings aren't enough.
>> >> >>
>> >> >>
>> >> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:
>> >> >>
>> >> >> > If I understand your problem correctly, then you don't want the
>> most
>> >> >> > frequently used fqs removed and you do not want your filter cache
>> to
>> >> grow
>> >> >> > to very large sizes.
>> >> >> > Well there is already a solution for both of these.
>> >> >> > In the solrconfig.xml file, you can configure the <filterCache>
>> >> parameter
>> >> >> > to suit your needs.
>> >> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
>> >> >> > b) Set the size to whatever number of fqs you find suitable.
>> >> >> > You can do this like so:
>> >> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
>> >> >> > autoWarmCount="10"/>
>> >> >> > You should play around with these parameters to find the best
>> >> combination
>> >> >> > for your implementation.
>> >> >> > For more details take a look here:
>> >> >> > https://wiki.apache.org/solr/SolrCaching
>> >> >> > http://yonik.com/advanced-filter-caching-in-solr/
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <
>> matteo.grolla@gmail.com
>> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi,
>> >> >> > >     after looking at the presentation of cloudsearch from lucene
>> >> >> > revolution
>> >> >> > > 2014
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
>> >> >> > > min 17:08
>> >> >> > >
>> >> >> > > I recognized I'd love to be able to remove the burden of
>> disabling
>> >> >> filter
>> >> >> > > query caching from developers
>> >> >> > >
>> >> >> > > the problem:
>> >> >> > > Solr by default caches filter queries
>> >> >> > > a) When there are filter queries that are not reused and few that
>> >> are
>> >> >> the
>> >> >> > > good ones get evicted unnecessarily
>> >> >> > > b) if the same query has multiple filter queries that are very
>> >> >> selective
>> >> >> > I
>> >> >> > > noticed a big performance disabling cache
>> >> >> > > c) I'd like to spare developers from deciding what has to be
>> cached
>> >> or
>> >> >> > not
>> >> >> > >
>> >> >> > > the question:
>> >> >> > > -Is there anything already working to solve those problems?
>> >> >> > >
>> >> >> > > what do you think about this?
>> >> >> > > -I was thinking to write a plugin to recognize query types with
>> >> regular
>> >> >> > > exception and let solr admins associate a caching behaviour with
>> >> each
>> >> >> > query
>> >> >> > > type
>> >> >> > > -another idea was to
>> >> >> > >    -by default set fq caching off
>> >> >> > >    -keep statistics about fq
>> >> >> > >    -enable caching only for the N fq with highest hit ratio
>> >> >> > >
>> >> >> > --
>> >> >> > Regards,
>> >> >> > Binoy Dalal
>> >> >> >
>> >> >>
>> >> > --
>> >> > Regards,
>> >> > Binoy Dalal
>> >>
>>

Re: enable disable filter query caching based on statistics

Posted by Matteo Grolla <ma...@gmail.com>.
Hi Erik,
     the test was done on thousands of queries of that kind and milions of
docs
I went from <1500 qpm to ~ 6000 qpm on modest virtualized hardware (cpu
bound and cpu was scarce)
After that customer happy, time finished and didn't go further but
definitely cost was something I'd try
When I saw the presentation of CloudSearch where they explained that they
were enabling/disabling caching based on fq statistics I thought this kind
of problem were general enough that I could find a plugin already built

2016-01-05 19:17 GMT+01:00 Erick Erickson <er...@gmail.com>:

>
> &fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
>
> You have a comma in front of the last fq clause, typo?
>
> Well, the whole point of caching filter queries is so that the
> _second_ time you use it,
> very little work has to be done. That comes at a cost of course for
> first-time execution.
> Basically any fq clause that you can guarantee won't be re-used should
> have cache=false
> set.
>
> I'd be surprised if the second time you use the provincia and type fq
> clauses not caching
> would be faster, but I've been surprised before. I guess anding two
> bitsets together could
> take more time than, say, testing a small number of individual
> documents....
>
> And I'm assuming that you're testing multiple queries rather than just
> one-offs.
>
> If you _do_ know that some of your clauses are very restrictive, I
> wonder what happens if
> you add a cost in. fq's are evaluated in cost order (when
> cache=false), so what happens
> in this case?
> &fq={!cache=false cost=101}n_rea:xxx&fq={!cache=false
> cost=102}provincia:yyyy&fq={!cache=false cost=103}type:zzzz
>
> Best,
> Erick
>
> On Tue, Jan 5, 2016 at 9:41 AM, Matteo Grolla <ma...@gmail.com>
> wrote:
> > Thanks Erik and Binoy,
> >      This is a case I stumbled upon: with queries like
> >
> >
> q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
> >
> > where n_rea filter is highly selective
> > I was able to make > 3x performance improvement disabling cache
> >
> > I think it's because the last two filters are not so selective, they are
> > resolved to two bitset which are then anded together
> > and this is less efficient than leapfrogging since the first filter has
> > just one or two results.
> > Does it make sense to you?
> >
> >
> >
> >
> >
> > 2016-01-05 16:59 GMT+01:00 Erick Erickson <er...@gmail.com>:
> >
> >> Matteo:
> >>
> >> Let's see if I understand your problem. Essentially you want
> >> Solr to analyze the filter queries and decide through some
> >> algorithm which ones to cache. I have a hard time thinking of
> >> any general way to do this, certainly there's not hing in Solr
> >> that does this automatically As Binoy mentions there are some
> >> ways to influence what goes in the cache, but the algorithm is
> >> simple....
> >>
> >> If you build such a thing, I suspect you'll be implicitly building
> >> in knowledge of how your particular application uses Solr. For
> >> sure, the functionality around "no cache filters" is there explicitly
> >> because some fq clauses (think ACL calculations) can be
> >> very expensive to calculate for the entire corpus (which is what
> >> fqs do by default).
> >>
> >> But you really haven't given us some examples of what sorts
> >> of fq clauses you consider "bad". Perhaps there are other ways
> >> of approaching your problem.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <bi...@gmail.com>
> >> wrote:
> >> > What is your exact requirement then?
> >> > I ask, because these settings can solve the problems you've mentioned
> >> > without the need to add any additional functionality.
> >> >
> >> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <matteo.grolla@gmail.com
> >
> >> > wrote:
> >> >
> >> >> Hi Binoy,
> >> >>      I know these settings but the problem I'm trying to solve is
> when
> >> >> these settings aren't enough.
> >> >>
> >> >>
> >> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:
> >> >>
> >> >> > If I understand your problem correctly, then you don't want the
> most
> >> >> > frequently used fqs removed and you do not want your filter cache
> to
> >> grow
> >> >> > to very large sizes.
> >> >> > Well there is already a solution for both of these.
> >> >> > In the solrconfig.xml file, you can configure the <filterCache>
> >> parameter
> >> >> > to suit your needs.
> >> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
> >> >> > b) Set the size to whatever number of fqs you find suitable.
> >> >> > You can do this like so:
> >> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
> >> >> > autoWarmCount="10"/>
> >> >> > You should play around with these parameters to find the best
> >> combination
> >> >> > for your implementation.
> >> >> > For more details take a look here:
> >> >> > https://wiki.apache.org/solr/SolrCaching
> >> >> > http://yonik.com/advanced-filter-caching-in-solr/
> >> >> >
> >> >> >
> >> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <
> matteo.grolla@gmail.com
> >> >
> >> >> > wrote:
> >> >> >
> >> >> > > Hi,
> >> >> > >     after looking at the presentation of cloudsearch from lucene
> >> >> > revolution
> >> >> > > 2014
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
> >> >> > > min 17:08
> >> >> > >
> >> >> > > I recognized I'd love to be able to remove the burden of
> disabling
> >> >> filter
> >> >> > > query caching from developers
> >> >> > >
> >> >> > > the problem:
> >> >> > > Solr by default caches filter queries
> >> >> > > a) When there are filter queries that are not reused and few that
> >> are
> >> >> the
> >> >> > > good ones get evicted unnecessarily
> >> >> > > b) if the same query has multiple filter queries that are very
> >> >> selective
> >> >> > I
> >> >> > > noticed a big performance disabling cache
> >> >> > > c) I'd like to spare developers from deciding what has to be
> cached
> >> or
> >> >> > not
> >> >> > >
> >> >> > > the question:
> >> >> > > -Is there anything already working to solve those problems?
> >> >> > >
> >> >> > > what do you think about this?
> >> >> > > -I was thinking to write a plugin to recognize query types with
> >> regular
> >> >> > > exception and let solr admins associate a caching behaviour with
> >> each
> >> >> > query
> >> >> > > type
> >> >> > > -another idea was to
> >> >> > >    -by default set fq caching off
> >> >> > >    -keep statistics about fq
> >> >> > >    -enable caching only for the N fq with highest hit ratio
> >> >> > >
> >> >> > --
> >> >> > Regards,
> >> >> > Binoy Dalal
> >> >> >
> >> >>
> >> > --
> >> > Regards,
> >> > Binoy Dalal
> >>
>

Re: enable disable filter query caching based on statistics

Posted by Binoy Dalal <bi...@gmail.com>.
@Eric I might be wrong here so please correct me if I am.
In the particular case that Matteo has given applying the filters as post
won't make any difference since the query is going to return all docs
anyways. In such a case won't applying fqs normally be the same as applying
them as post filters?
I assume here that that was your intention while writing the costs as > 100.

On Tue, 5 Jan 2016, 23:47 Erick Erickson <er...@gmail.com> wrote:

>
> &fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
>
> You have a comma in front of the last fq clause, typo?
>
> Well, the whole point of caching filter queries is so that the
> _second_ time you use it,
> very little work has to be done. That comes at a cost of course for
> first-time execution.
> Basically any fq clause that you can guarantee won't be re-used should
> have cache=false
> set.
>
> I'd be surprised if the second time you use the provincia and type fq
> clauses not caching
> would be faster, but I've been surprised before. I guess anding two
> bitsets together could
> take more time than, say, testing a small number of individual
> documents....
>
> And I'm assuming that you're testing multiple queries rather than just
> one-offs.
>
> If you _do_ know that some of your clauses are very restrictive, I
> wonder what happens if
> you add a cost in. fq's are evaluated in cost order (when
> cache=false), so what happens
> in this case?
> &fq={!cache=false cost=101}n_rea:xxx&fq={!cache=false
> cost=102}provincia:yyyy&fq={!cache=false cost=103}type:zzzz
>
> Best,
> Erick
>
> On Tue, Jan 5, 2016 at 9:41 AM, Matteo Grolla <ma...@gmail.com>
> wrote:
> > Thanks Erik and Binoy,
> >      This is a case I stumbled upon: with queries like
> >
> >
> q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
> >
> > where n_rea filter is highly selective
> > I was able to make > 3x performance improvement disabling cache
> >
> > I think it's because the last two filters are not so selective, they are
> > resolved to two bitset which are then anded together
> > and this is less efficient than leapfrogging since the first filter has
> > just one or two results.
> > Does it make sense to you?
> >
> >
> >
> >
> >
> > 2016-01-05 16:59 GMT+01:00 Erick Erickson <er...@gmail.com>:
> >
> >> Matteo:
> >>
> >> Let's see if I understand your problem. Essentially you want
> >> Solr to analyze the filter queries and decide through some
> >> algorithm which ones to cache. I have a hard time thinking of
> >> any general way to do this, certainly there's not hing in Solr
> >> that does this automatically As Binoy mentions there are some
> >> ways to influence what goes in the cache, but the algorithm is
> >> simple....
> >>
> >> If you build such a thing, I suspect you'll be implicitly building
> >> in knowledge of how your particular application uses Solr. For
> >> sure, the functionality around "no cache filters" is there explicitly
> >> because some fq clauses (think ACL calculations) can be
> >> very expensive to calculate for the entire corpus (which is what
> >> fqs do by default).
> >>
> >> But you really haven't given us some examples of what sorts
> >> of fq clauses you consider "bad". Perhaps there are other ways
> >> of approaching your problem.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <bi...@gmail.com>
> >> wrote:
> >> > What is your exact requirement then?
> >> > I ask, because these settings can solve the problems you've mentioned
> >> > without the need to add any additional functionality.
> >> >
> >> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <matteo.grolla@gmail.com
> >
> >> > wrote:
> >> >
> >> >> Hi Binoy,
> >> >>      I know these settings but the problem I'm trying to solve is
> when
> >> >> these settings aren't enough.
> >> >>
> >> >>
> >> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:
> >> >>
> >> >> > If I understand your problem correctly, then you don't want the
> most
> >> >> > frequently used fqs removed and you do not want your filter cache
> to
> >> grow
> >> >> > to very large sizes.
> >> >> > Well there is already a solution for both of these.
> >> >> > In the solrconfig.xml file, you can configure the <filterCache>
> >> parameter
> >> >> > to suit your needs.
> >> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
> >> >> > b) Set the size to whatever number of fqs you find suitable.
> >> >> > You can do this like so:
> >> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
> >> >> > autoWarmCount="10"/>
> >> >> > You should play around with these parameters to find the best
> >> combination
> >> >> > for your implementation.
> >> >> > For more details take a look here:
> >> >> > https://wiki.apache.org/solr/SolrCaching
> >> >> > http://yonik.com/advanced-filter-caching-in-solr/
> >> >> >
> >> >> >
> >> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <
> matteo.grolla@gmail.com
> >> >
> >> >> > wrote:
> >> >> >
> >> >> > > Hi,
> >> >> > >     after looking at the presentation of cloudsearch from lucene
> >> >> > revolution
> >> >> > > 2014
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
> >> >> > > min 17:08
> >> >> > >
> >> >> > > I recognized I'd love to be able to remove the burden of
> disabling
> >> >> filter
> >> >> > > query caching from developers
> >> >> > >
> >> >> > > the problem:
> >> >> > > Solr by default caches filter queries
> >> >> > > a) When there are filter queries that are not reused and few that
> >> are
> >> >> the
> >> >> > > good ones get evicted unnecessarily
> >> >> > > b) if the same query has multiple filter queries that are very
> >> >> selective
> >> >> > I
> >> >> > > noticed a big performance disabling cache
> >> >> > > c) I'd like to spare developers from deciding what has to be
> cached
> >> or
> >> >> > not
> >> >> > >
> >> >> > > the question:
> >> >> > > -Is there anything already working to solve those problems?
> >> >> > >
> >> >> > > what do you think about this?
> >> >> > > -I was thinking to write a plugin to recognize query types with
> >> regular
> >> >> > > exception and let solr admins associate a caching behaviour with
> >> each
> >> >> > query
> >> >> > > type
> >> >> > > -another idea was to
> >> >> > >    -by default set fq caching off
> >> >> > >    -keep statistics about fq
> >> >> > >    -enable caching only for the N fq with highest hit ratio
> >> >> > >
> >> >> > --
> >> >> > Regards,
> >> >> > Binoy Dalal
> >> >> >
> >> >>
> >> > --
> >> > Regards,
> >> > Binoy Dalal
> >>
>
-- 
Regards,
Binoy Dalal

Re: enable disable filter query caching based on statistics

Posted by Erick Erickson <er...@gmail.com>.
&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz

You have a comma in front of the last fq clause, typo?

Well, the whole point of caching filter queries is so that the
_second_ time you use it,
very little work has to be done. That comes at a cost of course for
first-time execution.
Basically any fq clause that you can guarantee won't be re-used should
have cache=false
set.

I'd be surprised if the second time you use the provincia and type fq
clauses not caching
would be faster, but I've been surprised before. I guess anding two
bitsets together could
take more time than, say, testing a small number of individual documents....

And I'm assuming that you're testing multiple queries rather than just one-offs.

If you _do_ know that some of your clauses are very restrictive, I
wonder what happens if
you add a cost in. fq's are evaluated in cost order (when
cache=false), so what happens
in this case?
&fq={!cache=false cost=101}n_rea:xxx&fq={!cache=false
cost=102}provincia:yyyy&fq={!cache=false cost=103}type:zzzz

Best,
Erick

On Tue, Jan 5, 2016 at 9:41 AM, Matteo Grolla <ma...@gmail.com> wrote:
> Thanks Erik and Binoy,
>      This is a case I stumbled upon: with queries like
>
> q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz
>
> where n_rea filter is highly selective
> I was able to make > 3x performance improvement disabling cache
>
> I think it's because the last two filters are not so selective, they are
> resolved to two bitset which are then anded together
> and this is less efficient than leapfrogging since the first filter has
> just one or two results.
> Does it make sense to you?
>
>
>
>
>
> 2016-01-05 16:59 GMT+01:00 Erick Erickson <er...@gmail.com>:
>
>> Matteo:
>>
>> Let's see if I understand your problem. Essentially you want
>> Solr to analyze the filter queries and decide through some
>> algorithm which ones to cache. I have a hard time thinking of
>> any general way to do this, certainly there's not hing in Solr
>> that does this automatically As Binoy mentions there are some
>> ways to influence what goes in the cache, but the algorithm is
>> simple....
>>
>> If you build such a thing, I suspect you'll be implicitly building
>> in knowledge of how your particular application uses Solr. For
>> sure, the functionality around "no cache filters" is there explicitly
>> because some fq clauses (think ACL calculations) can be
>> very expensive to calculate for the entire corpus (which is what
>> fqs do by default).
>>
>> But you really haven't given us some examples of what sorts
>> of fq clauses you consider "bad". Perhaps there are other ways
>> of approaching your problem.
>>
>> Best,
>> Erick
>>
>>
>> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <bi...@gmail.com>
>> wrote:
>> > What is your exact requirement then?
>> > I ask, because these settings can solve the problems you've mentioned
>> > without the need to add any additional functionality.
>> >
>> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <ma...@gmail.com>
>> > wrote:
>> >
>> >> Hi Binoy,
>> >>      I know these settings but the problem I'm trying to solve is when
>> >> these settings aren't enough.
>> >>
>> >>
>> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:
>> >>
>> >> > If I understand your problem correctly, then you don't want the most
>> >> > frequently used fqs removed and you do not want your filter cache to
>> grow
>> >> > to very large sizes.
>> >> > Well there is already a solution for both of these.
>> >> > In the solrconfig.xml file, you can configure the <filterCache>
>> parameter
>> >> > to suit your needs.
>> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
>> >> > b) Set the size to whatever number of fqs you find suitable.
>> >> > You can do this like so:
>> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
>> >> > autoWarmCount="10"/>
>> >> > You should play around with these parameters to find the best
>> combination
>> >> > for your implementation.
>> >> > For more details take a look here:
>> >> > https://wiki.apache.org/solr/SolrCaching
>> >> > http://yonik.com/advanced-filter-caching-in-solr/
>> >> >
>> >> >
>> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <matteo.grolla@gmail.com
>> >
>> >> > wrote:
>> >> >
>> >> > > Hi,
>> >> > >     after looking at the presentation of cloudsearch from lucene
>> >> > revolution
>> >> > > 2014
>> >> > >
>> >> > >
>> >> >
>> >>
>> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
>> >> > > min 17:08
>> >> > >
>> >> > > I recognized I'd love to be able to remove the burden of disabling
>> >> filter
>> >> > > query caching from developers
>> >> > >
>> >> > > the problem:
>> >> > > Solr by default caches filter queries
>> >> > > a) When there are filter queries that are not reused and few that
>> are
>> >> the
>> >> > > good ones get evicted unnecessarily
>> >> > > b) if the same query has multiple filter queries that are very
>> >> selective
>> >> > I
>> >> > > noticed a big performance disabling cache
>> >> > > c) I'd like to spare developers from deciding what has to be cached
>> or
>> >> > not
>> >> > >
>> >> > > the question:
>> >> > > -Is there anything already working to solve those problems?
>> >> > >
>> >> > > what do you think about this?
>> >> > > -I was thinking to write a plugin to recognize query types with
>> regular
>> >> > > exception and let solr admins associate a caching behaviour with
>> each
>> >> > query
>> >> > > type
>> >> > > -another idea was to
>> >> > >    -by default set fq caching off
>> >> > >    -keep statistics about fq
>> >> > >    -enable caching only for the N fq with highest hit ratio
>> >> > >
>> >> > --
>> >> > Regards,
>> >> > Binoy Dalal
>> >> >
>> >>
>> > --
>> > Regards,
>> > Binoy Dalal
>>

Re: enable disable filter query caching based on statistics

Posted by Matteo Grolla <ma...@gmail.com>.
Thanks Erik and Binoy,
     This is a case I stumbled upon: with queries like

q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz

where n_rea filter is highly selective
I was able to make > 3x performance improvement disabling cache

I think it's because the last two filters are not so selective, they are
resolved to two bitset which are then anded together
and this is less efficient than leapfrogging since the first filter has
just one or two results.
Does it make sense to you?





2016-01-05 16:59 GMT+01:00 Erick Erickson <er...@gmail.com>:

> Matteo:
>
> Let's see if I understand your problem. Essentially you want
> Solr to analyze the filter queries and decide through some
> algorithm which ones to cache. I have a hard time thinking of
> any general way to do this, certainly there's not hing in Solr
> that does this automatically As Binoy mentions there are some
> ways to influence what goes in the cache, but the algorithm is
> simple....
>
> If you build such a thing, I suspect you'll be implicitly building
> in knowledge of how your particular application uses Solr. For
> sure, the functionality around "no cache filters" is there explicitly
> because some fq clauses (think ACL calculations) can be
> very expensive to calculate for the entire corpus (which is what
> fqs do by default).
>
> But you really haven't given us some examples of what sorts
> of fq clauses you consider "bad". Perhaps there are other ways
> of approaching your problem.
>
> Best,
> Erick
>
>
> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <bi...@gmail.com>
> wrote:
> > What is your exact requirement then?
> > I ask, because these settings can solve the problems you've mentioned
> > without the need to add any additional functionality.
> >
> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <ma...@gmail.com>
> > wrote:
> >
> >> Hi Binoy,
> >>      I know these settings but the problem I'm trying to solve is when
> >> these settings aren't enough.
> >>
> >>
> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:
> >>
> >> > If I understand your problem correctly, then you don't want the most
> >> > frequently used fqs removed and you do not want your filter cache to
> grow
> >> > to very large sizes.
> >> > Well there is already a solution for both of these.
> >> > In the solrconfig.xml file, you can configure the <filterCache>
> parameter
> >> > to suit your needs.
> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
> >> > b) Set the size to whatever number of fqs you find suitable.
> >> > You can do this like so:
> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
> >> > autoWarmCount="10"/>
> >> > You should play around with these parameters to find the best
> combination
> >> > for your implementation.
> >> > For more details take a look here:
> >> > https://wiki.apache.org/solr/SolrCaching
> >> > http://yonik.com/advanced-filter-caching-in-solr/
> >> >
> >> >
> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <matteo.grolla@gmail.com
> >
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >     after looking at the presentation of cloudsearch from lucene
> >> > revolution
> >> > > 2014
> >> > >
> >> > >
> >> >
> >>
> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
> >> > > min 17:08
> >> > >
> >> > > I recognized I'd love to be able to remove the burden of disabling
> >> filter
> >> > > query caching from developers
> >> > >
> >> > > the problem:
> >> > > Solr by default caches filter queries
> >> > > a) When there are filter queries that are not reused and few that
> are
> >> the
> >> > > good ones get evicted unnecessarily
> >> > > b) if the same query has multiple filter queries that are very
> >> selective
> >> > I
> >> > > noticed a big performance disabling cache
> >> > > c) I'd like to spare developers from deciding what has to be cached
> or
> >> > not
> >> > >
> >> > > the question:
> >> > > -Is there anything already working to solve those problems?
> >> > >
> >> > > what do you think about this?
> >> > > -I was thinking to write a plugin to recognize query types with
> regular
> >> > > exception and let solr admins associate a caching behaviour with
> each
> >> > query
> >> > > type
> >> > > -another idea was to
> >> > >    -by default set fq caching off
> >> > >    -keep statistics about fq
> >> > >    -enable caching only for the N fq with highest hit ratio
> >> > >
> >> > --
> >> > Regards,
> >> > Binoy Dalal
> >> >
> >>
> > --
> > Regards,
> > Binoy Dalal
>

Re: enable disable filter query caching based on statistics

Posted by Erick Erickson <er...@gmail.com>.
Matteo:

Let's see if I understand your problem. Essentially you want
Solr to analyze the filter queries and decide through some
algorithm which ones to cache. I have a hard time thinking of
any general way to do this, certainly there's not hing in Solr
that does this automatically As Binoy mentions there are some
ways to influence what goes in the cache, but the algorithm is
simple....

If you build such a thing, I suspect you'll be implicitly building
in knowledge of how your particular application uses Solr. For
sure, the functionality around "no cache filters" is there explicitly
because some fq clauses (think ACL calculations) can be
very expensive to calculate for the entire corpus (which is what
fqs do by default).

But you really haven't given us some examples of what sorts
of fq clauses you consider "bad". Perhaps there are other ways
of approaching your problem.

Best,
Erick


On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <bi...@gmail.com> wrote:
> What is your exact requirement then?
> I ask, because these settings can solve the problems you've mentioned
> without the need to add any additional functionality.
>
> On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <ma...@gmail.com>
> wrote:
>
>> Hi Binoy,
>>      I know these settings but the problem I'm trying to solve is when
>> these settings aren't enough.
>>
>>
>> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:
>>
>> > If I understand your problem correctly, then you don't want the most
>> > frequently used fqs removed and you do not want your filter cache to grow
>> > to very large sizes.
>> > Well there is already a solution for both of these.
>> > In the solrconfig.xml file, you can configure the <filterCache> parameter
>> > to suit your needs.
>> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
>> > b) Set the size to whatever number of fqs you find suitable.
>> > You can do this like so:
>> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
>> > autoWarmCount="10"/>
>> > You should play around with these parameters to find the best combination
>> > for your implementation.
>> > For more details take a look here:
>> > https://wiki.apache.org/solr/SolrCaching
>> > http://yonik.com/advanced-filter-caching-in-solr/
>> >
>> >
>> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <ma...@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > >     after looking at the presentation of cloudsearch from lucene
>> > revolution
>> > > 2014
>> > >
>> > >
>> >
>> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
>> > > min 17:08
>> > >
>> > > I recognized I'd love to be able to remove the burden of disabling
>> filter
>> > > query caching from developers
>> > >
>> > > the problem:
>> > > Solr by default caches filter queries
>> > > a) When there are filter queries that are not reused and few that are
>> the
>> > > good ones get evicted unnecessarily
>> > > b) if the same query has multiple filter queries that are very
>> selective
>> > I
>> > > noticed a big performance disabling cache
>> > > c) I'd like to spare developers from deciding what has to be cached or
>> > not
>> > >
>> > > the question:
>> > > -Is there anything already working to solve those problems?
>> > >
>> > > what do you think about this?
>> > > -I was thinking to write a plugin to recognize query types with regular
>> > > exception and let solr admins associate a caching behaviour with each
>> > query
>> > > type
>> > > -another idea was to
>> > >    -by default set fq caching off
>> > >    -keep statistics about fq
>> > >    -enable caching only for the N fq with highest hit ratio
>> > >
>> > --
>> > Regards,
>> > Binoy Dalal
>> >
>>
> --
> Regards,
> Binoy Dalal

Re: enable disable filter query caching based on statistics

Posted by Binoy Dalal <bi...@gmail.com>.
What is your exact requirement then?
I ask, because these settings can solve the problems you've mentioned
without the need to add any additional functionality.

On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <ma...@gmail.com>
wrote:

> Hi Binoy,
>      I know these settings but the problem I'm trying to solve is when
> these settings aren't enough.
>
>
> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:
>
> > If I understand your problem correctly, then you don't want the most
> > frequently used fqs removed and you do not want your filter cache to grow
> > to very large sizes.
> > Well there is already a solution for both of these.
> > In the solrconfig.xml file, you can configure the <filterCache> parameter
> > to suit your needs.
> > a) Use the LeastFrequentlyUsed or LFU eviction policy.
> > b) Set the size to whatever number of fqs you find suitable.
> > You can do this like so:
> > <filterCache class="solr.LFUCache" size="100" initialSize="10"
> > autoWarmCount="10"/>
> > You should play around with these parameters to find the best combination
> > for your implementation.
> > For more details take a look here:
> > https://wiki.apache.org/solr/SolrCaching
> > http://yonik.com/advanced-filter-caching-in-solr/
> >
> >
> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <ma...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >     after looking at the presentation of cloudsearch from lucene
> > revolution
> > > 2014
> > >
> > >
> >
> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
> > > min 17:08
> > >
> > > I recognized I'd love to be able to remove the burden of disabling
> filter
> > > query caching from developers
> > >
> > > the problem:
> > > Solr by default caches filter queries
> > > a) When there are filter queries that are not reused and few that are
> the
> > > good ones get evicted unnecessarily
> > > b) if the same query has multiple filter queries that are very
> selective
> > I
> > > noticed a big performance disabling cache
> > > c) I'd like to spare developers from deciding what has to be cached or
> > not
> > >
> > > the question:
> > > -Is there anything already working to solve those problems?
> > >
> > > what do you think about this?
> > > -I was thinking to write a plugin to recognize query types with regular
> > > exception and let solr admins associate a caching behaviour with each
> > query
> > > type
> > > -another idea was to
> > >    -by default set fq caching off
> > >    -keep statistics about fq
> > >    -enable caching only for the N fq with highest hit ratio
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal

Re: enable disable filter query caching based on statistics

Posted by Matteo Grolla <ma...@gmail.com>.
Hi Binoy,
     I know these settings but the problem I'm trying to solve is when
these settings aren't enough.


2016-01-05 16:30 GMT+01:00 Binoy Dalal <bi...@gmail.com>:

> If I understand your problem correctly, then you don't want the most
> frequently used fqs removed and you do not want your filter cache to grow
> to very large sizes.
> Well there is already a solution for both of these.
> In the solrconfig.xml file, you can configure the <filterCache> parameter
> to suit your needs.
> a) Use the LeastFrequentlyUsed or LFU eviction policy.
> b) Set the size to whatever number of fqs you find suitable.
> You can do this like so:
> <filterCache class="solr.LFUCache" size="100" initialSize="10"
> autoWarmCount="10"/>
> You should play around with these parameters to find the best combination
> for your implementation.
> For more details take a look here:
> https://wiki.apache.org/solr/SolrCaching
> http://yonik.com/advanced-filter-caching-in-solr/
>
>
> On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <ma...@gmail.com>
> wrote:
>
> > Hi,
> >     after looking at the presentation of cloudsearch from lucene
> revolution
> > 2014
> >
> >
> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
> > min 17:08
> >
> > I recognized I'd love to be able to remove the burden of disabling filter
> > query caching from developers
> >
> > the problem:
> > Solr by default caches filter queries
> > a) When there are filter queries that are not reused and few that are the
> > good ones get evicted unnecessarily
> > b) if the same query has multiple filter queries that are very selective
> I
> > noticed a big performance disabling cache
> > c) I'd like to spare developers from deciding what has to be cached or
> not
> >
> > the question:
> > -Is there anything already working to solve those problems?
> >
> > what do you think about this?
> > -I was thinking to write a plugin to recognize query types with regular
> > exception and let solr admins associate a caching behaviour with each
> query
> > type
> > -another idea was to
> >    -by default set fq caching off
> >    -keep statistics about fq
> >    -enable caching only for the N fq with highest hit ratio
> >
> --
> Regards,
> Binoy Dalal
>

Re: enable disable filter query caching based on statistics

Posted by Binoy Dalal <bi...@gmail.com>.
If I understand your problem correctly, then you don't want the most
frequently used fqs removed and you do not want your filter cache to grow
to very large sizes.
Well there is already a solution for both of these.
In the solrconfig.xml file, you can configure the <filterCache> parameter
to suit your needs.
a) Use the LeastFrequentlyUsed or LFU eviction policy.
b) Set the size to whatever number of fqs you find suitable.
You can do this like so:
<filterCache class="solr.LFUCache" size="100" initialSize="10"
autoWarmCount="10"/>
You should play around with these parameters to find the best combination
for your implementation.
For more details take a look here:
https://wiki.apache.org/solr/SolrCaching
http://yonik.com/advanced-filter-caching-in-solr/


On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla <ma...@gmail.com>
wrote:

> Hi,
>     after looking at the presentation of cloudsearch from lucene revolution
> 2014
>
> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49
> min 17:08
>
> I recognized I'd love to be able to remove the burden of disabling filter
> query caching from developers
>
> the problem:
> Solr by default caches filter queries
> a) When there are filter queries that are not reused and few that are the
> good ones get evicted unnecessarily
> b) if the same query has multiple filter queries that are very selective I
> noticed a big performance disabling cache
> c) I'd like to spare developers from deciding what has to be cached or not
>
> the question:
> -Is there anything already working to solve those problems?
>
> what do you think about this?
> -I was thinking to write a plugin to recognize query types with regular
> exception and let solr admins associate a caching behaviour with each query
> type
> -another idea was to
>    -by default set fq caching off
>    -keep statistics about fq
>    -enable caching only for the N fq with highest hit ratio
>
-- 
Regards,
Binoy Dalal