You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jeff Wartes <jw...@whitepages.com> on 2015/10/01 18:43:58 UTC

Facet queries blow out the filterCache

I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud
index on fields like this:

<field name="city" type="string" indexed="true" stored="false"
docValues="true”/>

that look something like this:
q=...&fl=id,score&facet.field=city&facet=true&f.city.facet.mincount=1&f.cit
y.facet.limit=50&rows=0&start=0&facet.method=fc

(no, NOT facet.method=enum - the usage of the filterCache there is pretty
well documented)

Watching the filterCache stats, it appears that every one of these queries
causes the "inserts" counter to be incremented by one. Distinct "q="
queries also increase the "size", and eviction happens as normal. If I
repeat the same query a few times, "lookups" is not incremented, so these
entries generally appear to be completely wasted. (Although when running a
lot of these queries, it appears as though a very small set also increment
the "lookups" counter, but only a small set, and I haven’t figured out why
some are special.)

So the question is, why does this facet query have anything to do with the
filterCache? This causes a huge amount of filterCache churn with no
apparent benefit.


Re: Facet queries blow out the filterCache

Posted by Charlie Hull <ch...@flax.co.uk>.
On 01/10/2015 23:31, Jeff Wartes wrote:
> It still inserts if I address the core directly and use distrib=false.
>
> I’ve got a few collections sharing the same config, so it’s surprisingly
> annoying to
> change solrconfig.xml right now, but it seemed pretty clear the query is
> the thing being cached, since
> the cache size only changes when the query does.

Hi Jeff,

I think you may be hitting the same issue we found:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201409.mbox/%3CCAGe-MLJ+6y1at+OunK3sGaCFF6zGtJq_Nin9_3SHN0kFuqXkBA@mail.gmail.com%3E

Distributed faceting uses the filter cache, where you wouldn't expect it 
to. The solution was to set facet.limit to -1.

Best

Charlie
>
>
>
> On 10/1/15, 3:01 PM, "Mikhail Khludnev" <mk...@griddynamics.com> wrote:
>
>> hm..
>> This option was useful for introspecting cache content
>> https://wiki.apache.org/solr/SolrCaching#showItems It might help you to
>> find-out a cause.
>> I'm still blaming distributed requests, it expained here
>> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-Re
>> questParameters
>> eg does it happen if you run with distrib=false?
>>
>> On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes <jw...@whitepages.com>
>> wrote:
>>
>>>
>>> No change, still shows an insert per-request. As does a simplified
>>> request
>>> with only the facet params
>>> "&facet.field=city&facet=true"
>>>
>> by default it's 100
>> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Theface
>> t.limitParameter
>> and can cause filtering by values, it can be seen in logs, btw.
>>
>>>
>>> It’s definitely facet related though, facet=false eliminates the insert.
>>>
>>>
>>>
>>> On 10/1/15, 1:50 PM, "Mikhail Khludnev" <mk...@griddynamics.com>
>>> wrote:
>>>
>>>> what if you set f.city.facet.limit=-1 ?
>>>>
>>>> On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jw...@whitepages.com>
>>>> wrote:
>>>>
>>>>>
>>>>> I’m doing some fairly simple facet queries in a two-shard 5.3
>>> SolrCloud
>>>>> index on fields like this:
>>>>>
>>>>> <field name="city" type="string" indexed="true" stored="false"
>>>>> docValues="true”/>
>>>>>
>>>>> that look something like this:
>>>>>
>>>
>>>>> q=...&fl=id,score&facet.field=city&facet=true&f.city.facet.mincount=1&f
>>>>> .c
>>>>> it
>>>>> y.facet.limit=50&rows=0&start=0&facet.method=fc
>>>>>
>>>>> (no, NOT facet.method=enum - the usage of the filterCache there is
>>>>> pretty
>>>>> well documented)
>>>>>
>>>>> Watching the filterCache stats, it appears that every one of these
>>>>> queries
>>>>> causes the "inserts" counter to be incremented by one. Distinct "q="
>>>>> queries also increase the "size", and eviction happens as normal. If
>>> I
>>>>> repeat the same query a few times, "lookups" is not incremented, so
>>>>> these
>>>>> entries generally appear to be completely wasted. (Although when
>>>>> running a
>>>>> lot of these queries, it appears as though a very small set also
>>>>> increment
>>>>> the "lookups" counter, but only a small set, and I haven’t figured
>>> out
>>>>> why
>>>>> some are special.)
>>>>>
>>>>> So the question is, why does this facet query have anything to do
>>> with
>>>>> the
>>>>> filterCache? This causes a huge amount of filterCache churn with no
>>>>> apparent benefit.
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>> <mk...@griddynamics.com>
>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>> <mk...@griddynamics.com>
>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Facet queries blow out the filterCache

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
this insert is caused by
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1505

off-top thought:
showItems are useless, because now it looks like

   - item_name:foo:org.apache.solr.search.SortedIntDocSet@​2e1fbd46

   Shouldn't it be improved?


On Fri, Oct 2, 2015 at 11:58 PM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> I backed up a bit. I took the stock solr download and did this:
>
> solr-5.3.1>$ bin/solr -e techproducts
>
> So, no SolrCloud, default example config, about as basic as you get. I
> didn’t even bother indexing any docs. Then I issued this query:
>
> http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=true
> &facet.field=popularity&facet.mincount=0&facet.limit=-1
>
>
> This still causes an insert into the filterCache.
>
> The only real difference I’m noticing vs my solrcloud collection is that
> repeating the query increments cache lookups and hits. It’s still odd
> though, because issuing new distinct queries causes a reported insert, but
> not a lookup, so the cache hit ratio is always exactly 1.
>
>
>
> On 10/2/15, 4:18 AM, "Toke Eskildsen" <te...@statsbiblioteket.dk> wrote:
>
> >On Thu, 2015-10-01 at 22:31 +0000, Jeff Wartes wrote:
> >> It still inserts if I address the core directly and use distrib=false.
> >
> >It is quite strange that is is triggered with the direct access. If that
> >can be reproduced in test, it looks like a performance optimization to
> >be done.
> >
> >Anyway, operating under the assumption that the single-core facet
> >request for some reason acts as a distributed call, the key to avoid the
> >fine-counting is to ensure that _all_ possibly relevant term counts has
> >been returned in the first facet phase.
> >
> >Try setting both facet.mincount=0 and facet.limit=-1.
> >
> >- Toke Eskildsen, State and University Library, Denmark
> >
> >
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: Facet queries blow out the filterCache

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Jeff,
so far tests routine is reasonable, but since we count a facet, we expect
that filtering by one of this values is used at the following requests. I
suppose the next request with fq=popularity:1 or so might show reuse that
cached filter, but it's just my speculation.

On Tue, Oct 6, 2015 at 3:58 PM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> I dug far enough yesterday to find the GET_DOCSET, but not far enough to
> find why. Thanks, a little context is really helpful sometimes.
>
>
> So, starting with an empty filterCache...
>
> http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=true
> &facet.field=popularity
>
> New values:             lookups: 0, hits: 0, inserts: 1, size: 1
>
> So for the reasons you explained, "inserts" is incremented for this new
> search
>
> http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=true
> &facet.field=popularity
>
> New values: inserts:    lookups: 0, hits: 0, inserts 2, size: 2
>
>
> Another new search, another new insert. No "lookups" though, so how does
> it know name:boo wasn’t cached?
>
> http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=true
> &facet.field=popularity
> New values: inserts:    lookups: 1, hits: 1, inserts: 2, size: 2
>
>
> But it clearly does know - when I repeat the search, I get both a lookup
> and a hit. (and no insert) So is this just
> a bug in the stats reporting, perhaps?
>
>
> When I first started looking at this, it was in a solrcloud cluster, and
> one interesting thing about that cluster is that it was configured with
> the queryResultCache turned off, so let’s repeat the above experiment
> without the queryResultCache. (I’m just commenting it out in the
> techproducts config for this run.)
>
>
> Starting with an empty filterCache...
>
> http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=true
> &facet.field=popularity
> New values:             lookups: 0, hits: 0, inserts: 1, size: 1
>
> Same as before...
>
> http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=true
> &facet.field=popularity
> New values: inserts:    lookups: 0, hits: 0, inserts 2, size: 2
>
> Same as before...
>
> http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=true
> &facet.field=popularity
> New values: inserts: lookups: 0, hits: 0, inserts 3, size: 2
>
> No cache hit! We get an insert instead, but it’s already in there, so the
> size doesn’t change. So disabling the queryResultCache apparently causes
> facet queries to be unable to use the filterCache?
>
>
>
>
> I’m increasingly thinking that different use cases need different
> filterCaches, rather than try to bundle every explicit or unexpected
> use-case under one cache with one size and one regenerator.
>
>
>
>
>
>
> On 10/6/15, 2:45 PM, "Chris Hostetter" <ho...@fucit.org> wrote:
>
> >: So, no SolrCloud, default example config, about as basic as you get. I
> >: didn’t even bother indexing any docs. Then I issued this query:
> >:
> >:
> >
> http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=tru
> >e
> >: &facet.field=popularity&facet.mincount=0&facet.limit=-1
> >
> >: This still causes an insert into the filterCache.
> >
> >the faceting component is a type of operation that indicates in the
> >QueryCommand that it needs to GET_DOCSET for the set of all documents
> >matching the query (independent of pagination) -- the point of this
> >DocSet
> >is so the faceting logic can then compute the intersection of the set of
> >all matching documents with the set of documents matching each facet
> >constraint.  the cached DocSet will be re-used both within the context
> >of the current request, and in future facet requests over the
> >same query+filters.
> >
> >: The only real difference I’m noticing vs my solrcloud collection is that
> >: repeating the query increments cache lookups and hits. It’s still odd
> >: though, because issuing new distinct queries causes a reported insert,
> >but
> >: not a lookup, so the cache hit ratio is always exactly 1.
> >
> >i'm not following what you are saying at all ... can you give some
> >concrete examples (ie: "starting with an empty cache i do this request,
> >then i see these cache stats, then i do this identical/different query
> >and
> >then the cache stats look like this...")
> >
> >
> >
> >-Hoss
> >http://www.lucidworks.com/
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: Facet queries blow out the filterCache

Posted by Jeff Wartes <jw...@whitepages.com>.
FWIW, since it seemed like there was at least one bug here (and possibly
more), I filed
https://issues.apache.org/jira/browse/SOLR-8171



On 10/6/15, 3:58 PM, "Jeff Wartes" <jw...@whitepages.com> wrote:

>
>I dug far enough yesterday to find the GET_DOCSET, but not far enough to
>find why. Thanks, a little context is really helpful sometimes.
>
>
>So, starting with an empty filterCache...
>
>http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=tru
>e
>&facet.field=popularity
>
>New values: 		lookups: 0, hits: 0, inserts: 1, size: 1
>
>So for the reasons you explained, "inserts" is incremented for this new
>search
>
>http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=tru
>e
>&facet.field=popularity
>
>New values: inserts: 	lookups: 0, hits: 0, inserts 2, size: 2
>
>
>Another new search, another new insert. No "lookups" though, so how does
>it know name:boo wasn’t cached?
>
>http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=tru
>e
>&facet.field=popularity
>New values: inserts: 	lookups: 1, hits: 1, inserts: 2, size: 2
>
>
>But it clearly does know - when I repeat the search, I get both a lookup
>and a hit. (and no insert) So is this just
>a bug in the stats reporting, perhaps?
>
>
>When I first started looking at this, it was in a solrcloud cluster, and
>one interesting thing about that cluster is that it was configured with
>the queryResultCache turned off, so let’s repeat the above experiment
>without the queryResultCache. (I’m just commenting it out in the
>techproducts config for this run.)
>
>
>Starting with an empty filterCache...
>
>http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=tru
>e
>&facet.field=popularity
>New values: 		lookups: 0, hits: 0, inserts: 1, size: 1
>
>Same as before...
>
>http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=tru
>e
>&facet.field=popularity
>New values: inserts: 	lookups: 0, hits: 0, inserts 2, size: 2
>
>Same as before...
>
>http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=tru
>e
>&facet.field=popularity
>New values: inserts: lookups: 0, hits: 0, inserts 3, size: 2
>
>No cache hit! We get an insert instead, but it’s already in there, so the
>size doesn’t change. So disabling the queryResultCache apparently causes
>facet queries to be unable to use the filterCache?
>
>
>
>
>I’m increasingly thinking that different use cases need different
>filterCaches, rather than try to bundle every explicit or unexpected
>use-case under one cache with one size and one regenerator.
>
>
>
>
>
>
>On 10/6/15, 2:45 PM, "Chris Hostetter" <ho...@fucit.org> wrote:
>
>>: So, no SolrCloud, default example config, about as basic as you get. I
>>: didn’t even bother indexing any docs. Then I issued this query:
>>: 
>>: 
>>http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=tr
>>u
>>e
>>: &facet.field=popularity&facet.mincount=0&facet.limit=-1
>>
>>: This still causes an insert into the filterCache.
>>
>>the faceting component is a type of operation that indicates in the
>>QueryCommand that it needs to GET_DOCSET for the set of all documents
>>matching the query (independent of pagination) -- the point of this
>>DocSet 
>>is so the faceting logic can then compute the intersection of the set of
>>all matching documents with the set of documents matching each facet
>>constraint.  the cached DocSet will be re-used both within the context
>>of the current request, and in future facet requests over the
>>same query+filters.
>>
>>: The only real difference I’m noticing vs my solrcloud collection is
>>that
>>: repeating the query increments cache lookups and hits. It’s still odd
>>: though, because issuing new distinct queries causes a reported insert,
>>but
>>: not a lookup, so the cache hit ratio is always exactly 1.
>>
>>i'm not following what you are saying at all ... can you give some
>>concrete examples (ie: "starting with an empty cache i do this request,
>>then i see these cache stats, then i do this identical/different query
>>and 
>>then the cache stats look like this...")
>>
>>
>>
>>-Hoss
>>http://www.lucidworks.com/
>


Re: Facet queries blow out the filterCache

Posted by Jeff Wartes <jw...@whitepages.com>.
I dug far enough yesterday to find the GET_DOCSET, but not far enough to
find why. Thanks, a little context is really helpful sometimes.


So, starting with an empty filterCache...

http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=true
&facet.field=popularity

New values: 		lookups: 0, hits: 0, inserts: 1, size: 1

So for the reasons you explained, "inserts" is incremented for this new
search

http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=true
&facet.field=popularity

New values: inserts: 	lookups: 0, hits: 0, inserts 2, size: 2


Another new search, another new insert. No "lookups" though, so how does
it know name:boo wasn’t cached?

http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=true
&facet.field=popularity
New values: inserts: 	lookups: 1, hits: 1, inserts: 2, size: 2


But it clearly does know - when I repeat the search, I get both a lookup
and a hit. (and no insert) So is this just
a bug in the stats reporting, perhaps?


When I first started looking at this, it was in a solrcloud cluster, and
one interesting thing about that cluster is that it was configured with
the queryResultCache turned off, so let’s repeat the above experiment
without the queryResultCache. (I’m just commenting it out in the
techproducts config for this run.)


Starting with an empty filterCache...

http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=true
&facet.field=popularity
New values: 		lookups: 0, hits: 0, inserts: 1, size: 1

Same as before...

http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=true
&facet.field=popularity
New values: inserts: 	lookups: 0, hits: 0, inserts 2, size: 2

Same as before...

http://localhost:8983/solr/techproducts/select?q=name:boo&rows=1&facet=true
&facet.field=popularity
New values: inserts: lookups: 0, hits: 0, inserts 3, size: 2

No cache hit! We get an insert instead, but it’s already in there, so the
size doesn’t change. So disabling the queryResultCache apparently causes
facet queries to be unable to use the filterCache?




I’m increasingly thinking that different use cases need different
filterCaches, rather than try to bundle every explicit or unexpected
use-case under one cache with one size and one regenerator.






On 10/6/15, 2:45 PM, "Chris Hostetter" <ho...@fucit.org> wrote:

>: So, no SolrCloud, default example config, about as basic as you get. I
>: didn’t even bother indexing any docs. Then I issued this query:
>: 
>: 
>http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=tru
>e
>: &facet.field=popularity&facet.mincount=0&facet.limit=-1
>
>: This still causes an insert into the filterCache.
>
>the faceting component is a type of operation that indicates in the
>QueryCommand that it needs to GET_DOCSET for the set of all documents
>matching the query (independent of pagination) -- the point of this
>DocSet 
>is so the faceting logic can then compute the intersection of the set of
>all matching documents with the set of documents matching each facet
>constraint.  the cached DocSet will be re-used both within the context
>of the current request, and in future facet requests over the
>same query+filters.
>
>: The only real difference I’m noticing vs my solrcloud collection is that
>: repeating the query increments cache lookups and hits. It’s still odd
>: though, because issuing new distinct queries causes a reported insert,
>but
>: not a lookup, so the cache hit ratio is always exactly 1.
>
>i'm not following what you are saying at all ... can you give some
>concrete examples (ie: "starting with an empty cache i do this request,
>then i see these cache stats, then i do this identical/different query
>and 
>then the cache stats look like this...")
>
>
>
>-Hoss
>http://www.lucidworks.com/


Re: Facet queries blow out the filterCache

Posted by Chris Hostetter <ho...@fucit.org>.
: So, no SolrCloud, default example config, about as basic as you get. I
: didn’t even bother indexing any docs. Then I issued this query:
: 
: http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=true
: &facet.field=popularity&facet.mincount=0&facet.limit=-1

: This still causes an insert into the filterCache.

the faceting component is a type of operation that indicates in the 
QueryCommand that it needs to GET_DOCSET for the set of all documents 
matching the query (independent of pagination) -- the point of this DocSet 
is so the faceting logic can then compute the intersection of the set of 
all matching documents with the set of documents matching each facet 
constraint.  the cached DocSet will be re-used both within the context 
of the current request, and in future facet requests over the 
same query+filters.

: The only real difference I’m noticing vs my solrcloud collection is that
: repeating the query increments cache lookups and hits. It’s still odd
: though, because issuing new distinct queries causes a reported insert, but
: not a lookup, so the cache hit ratio is always exactly 1.

i'm not following what you are saying at all ... can you give some 
concrete examples (ie: "starting with an empty cache i do this request, 
then i see these cache stats, then i do this identical/different query and 
then the cache stats look like this...")



-Hoss
http://www.lucidworks.com/

Re: Facet queries blow out the filterCache

Posted by Jeff Wartes <jw...@whitepages.com>.
I backed up a bit. I took the stock solr download and did this:

solr-5.3.1>$ bin/solr -e techproducts

So, no SolrCloud, default example config, about as basic as you get. I
didn’t even bother indexing any docs. Then I issued this query:

http://localhost:8983/solr/techproducts/select?q=name:foo&rows=1&facet=true
&facet.field=popularity&facet.mincount=0&facet.limit=-1


This still causes an insert into the filterCache.

The only real difference I’m noticing vs my solrcloud collection is that
repeating the query increments cache lookups and hits. It’s still odd
though, because issuing new distinct queries causes a reported insert, but
not a lookup, so the cache hit ratio is always exactly 1.



On 10/2/15, 4:18 AM, "Toke Eskildsen" <te...@statsbiblioteket.dk> wrote:

>On Thu, 2015-10-01 at 22:31 +0000, Jeff Wartes wrote:
>> It still inserts if I address the core directly and use distrib=false.
>
>It is quite strange that is is triggered with the direct access. If that
>can be reproduced in test, it looks like a performance optimization to
>be done.
>
>Anyway, operating under the assumption that the single-core facet
>request for some reason acts as a distributed call, the key to avoid the
>fine-counting is to ensure that _all_ possibly relevant term counts has
>been returned in the first facet phase.
>
>Try setting both facet.mincount=0 and facet.limit=-1.
>
>- Toke Eskildsen, State and University Library, Denmark
>
>


Re: Facet queries blow out the filterCache

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2015-10-01 at 22:31 +0000, Jeff Wartes wrote:
> It still inserts if I address the core directly and use distrib=false.

It is quite strange that is is triggered with the direct access. If that
can be reproduced in test, it looks like a performance optimization to
be done.

Anyway, operating under the assumption that the single-core facet
request for some reason acts as a distributed call, the key to avoid the
fine-counting is to ensure that _all_ possibly relevant term counts has
been returned in the first facet phase. 

Try setting both facet.mincount=0 and facet.limit=-1.

- Toke Eskildsen, State and University Library, Denmark



Re: Facet queries blow out the filterCache

Posted by Jeff Wartes <jw...@whitepages.com>.
It still inserts if I address the core directly and use distrib=false.

I’ve got a few collections sharing the same config, so it’s surprisingly
annoying to
change solrconfig.xml right now, but it seemed pretty clear the query is
the thing being cached, since
the cache size only changes when the query does.



On 10/1/15, 3:01 PM, "Mikhail Khludnev" <mk...@griddynamics.com> wrote:

>hm..
>This option was useful for introspecting cache content
>https://wiki.apache.org/solr/SolrCaching#showItems It might help you to
>find-out a cause.
>I'm still blaming distributed requests, it expained here
>https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-Re
>questParameters
>eg does it happen if you run with distrib=false?
>
>On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes <jw...@whitepages.com>
>wrote:
>
>>
>> No change, still shows an insert per-request. As does a simplified
>>request
>> with only the facet params
>> "&facet.field=city&facet=true"
>>
>by default it's 100
>https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Theface
>t.limitParameter
>and can cause filtering by values, it can be seen in logs, btw.
>
>>
>> It’s definitely facet related though, facet=false eliminates the insert.
>>
>>
>>
>> On 10/1/15, 1:50 PM, "Mikhail Khludnev" <mk...@griddynamics.com>
>> wrote:
>>
>> >what if you set f.city.facet.limit=-1 ?
>> >
>> >On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jw...@whitepages.com>
>> >wrote:
>> >
>> >>
>> >> I’m doing some fairly simple facet queries in a two-shard 5.3
>>SolrCloud
>> >> index on fields like this:
>> >>
>> >> <field name="city" type="string" indexed="true" stored="false"
>> >> docValues="true”/>
>> >>
>> >> that look something like this:
>> >>
>> 
>>>>q=...&fl=id,score&facet.field=city&facet=true&f.city.facet.mincount=1&f
>>>>.c
>> >>it
>> >> y.facet.limit=50&rows=0&start=0&facet.method=fc
>> >>
>> >> (no, NOT facet.method=enum - the usage of the filterCache there is
>> >>pretty
>> >> well documented)
>> >>
>> >> Watching the filterCache stats, it appears that every one of these
>> >>queries
>> >> causes the "inserts" counter to be incremented by one. Distinct "q="
>> >> queries also increase the "size", and eviction happens as normal. If
>>I
>> >> repeat the same query a few times, "lookups" is not incremented, so
>> >>these
>> >> entries generally appear to be completely wasted. (Although when
>> >>running a
>> >> lot of these queries, it appears as though a very small set also
>> >>increment
>> >> the "lookups" counter, but only a small set, and I haven’t figured
>>out
>> >>why
>> >> some are special.)
>> >>
>> >> So the question is, why does this facet query have anything to do
>>with
>> >>the
>> >> filterCache? This causes a huge amount of filterCache churn with no
>> >> apparent benefit.
>> >>
>> >>
>> >
>> >
>> >--
>> >Sincerely yours
>> >Mikhail Khludnev
>> >Principal Engineer,
>> >Grid Dynamics
>> >
>> ><http://www.griddynamics.com>
>> ><mk...@griddynamics.com>
>>
>>
>
>
>-- 
>Sincerely yours
>Mikhail Khludnev
>Principal Engineer,
>Grid Dynamics
>
><http://www.griddynamics.com>
><mk...@griddynamics.com>


Re: Facet queries blow out the filterCache

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
hm..
This option was useful for introspecting cache content
https://wiki.apache.org/solr/SolrCaching#showItems It might help you to
find-out a cause.
I'm still blaming distributed requests, it expained here
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-RequestParameters
eg does it happen if you run with distrib=false?

On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> No change, still shows an insert per-request. As does a simplified request
> with only the facet params
> "&facet.field=city&facet=true"
>
by default it's 100
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.limitParameter
and can cause filtering by values, it can be seen in logs, btw.

>
> It’s definitely facet related though, facet=false eliminates the insert.
>
>
>
> On 10/1/15, 1:50 PM, "Mikhail Khludnev" <mk...@griddynamics.com>
> wrote:
>
> >what if you set f.city.facet.limit=-1 ?
> >
> >On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jw...@whitepages.com>
> >wrote:
> >
> >>
> >> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud
> >> index on fields like this:
> >>
> >> <field name="city" type="string" indexed="true" stored="false"
> >> docValues="true”/>
> >>
> >> that look something like this:
> >>
> >>q=...&fl=id,score&facet.field=city&facet=true&f.city.facet.mincount=1&f.c
> >>it
> >> y.facet.limit=50&rows=0&start=0&facet.method=fc
> >>
> >> (no, NOT facet.method=enum - the usage of the filterCache there is
> >>pretty
> >> well documented)
> >>
> >> Watching the filterCache stats, it appears that every one of these
> >>queries
> >> causes the "inserts" counter to be incremented by one. Distinct "q="
> >> queries also increase the "size", and eviction happens as normal. If I
> >> repeat the same query a few times, "lookups" is not incremented, so
> >>these
> >> entries generally appear to be completely wasted. (Although when
> >>running a
> >> lot of these queries, it appears as though a very small set also
> >>increment
> >> the "lookups" counter, but only a small set, and I haven’t figured out
> >>why
> >> some are special.)
> >>
> >> So the question is, why does this facet query have anything to do with
> >>the
> >> filterCache? This causes a huge amount of filterCache churn with no
> >> apparent benefit.
> >>
> >>
> >
> >
> >--
> >Sincerely yours
> >Mikhail Khludnev
> >Principal Engineer,
> >Grid Dynamics
> >
> ><http://www.griddynamics.com>
> ><mk...@griddynamics.com>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: Facet queries blow out the filterCache

Posted by Jeff Wartes <jw...@whitepages.com>.
No change, still shows an insert per-request. As does a simplified request
with only the facet params
"&facet.field=city&facet=true"

It’s definitely facet related though, facet=false eliminates the insert.



On 10/1/15, 1:50 PM, "Mikhail Khludnev" <mk...@griddynamics.com> wrote:

>what if you set f.city.facet.limit=-1 ?
>
>On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jw...@whitepages.com>
>wrote:
>
>>
>> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud
>> index on fields like this:
>>
>> <field name="city" type="string" indexed="true" stored="false"
>> docValues="true”/>
>>
>> that look something like this:
>> 
>>q=...&fl=id,score&facet.field=city&facet=true&f.city.facet.mincount=1&f.c
>>it
>> y.facet.limit=50&rows=0&start=0&facet.method=fc
>>
>> (no, NOT facet.method=enum - the usage of the filterCache there is
>>pretty
>> well documented)
>>
>> Watching the filterCache stats, it appears that every one of these
>>queries
>> causes the "inserts" counter to be incremented by one. Distinct "q="
>> queries also increase the "size", and eviction happens as normal. If I
>> repeat the same query a few times, "lookups" is not incremented, so
>>these
>> entries generally appear to be completely wasted. (Although when
>>running a
>> lot of these queries, it appears as though a very small set also
>>increment
>> the "lookups" counter, but only a small set, and I haven’t figured out
>>why
>> some are special.)
>>
>> So the question is, why does this facet query have anything to do with
>>the
>> filterCache? This causes a huge amount of filterCache churn with no
>> apparent benefit.
>>
>>
>
>
>-- 
>Sincerely yours
>Mikhail Khludnev
>Principal Engineer,
>Grid Dynamics
>
><http://www.griddynamics.com>
><mk...@griddynamics.com>


Re: Facet queries blow out the filterCache

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
what if you set f.city.facet.limit=-1 ?

On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud
> index on fields like this:
>
> <field name="city" type="string" indexed="true" stored="false"
> docValues="true”/>
>
> that look something like this:
> q=...&fl=id,score&facet.field=city&facet=true&f.city.facet.mincount=1&f.cit
> y.facet.limit=50&rows=0&start=0&facet.method=fc
>
> (no, NOT facet.method=enum - the usage of the filterCache there is pretty
> well documented)
>
> Watching the filterCache stats, it appears that every one of these queries
> causes the "inserts" counter to be incremented by one. Distinct "q="
> queries also increase the "size", and eviction happens as normal. If I
> repeat the same query a few times, "lookups" is not incremented, so these
> entries generally appear to be completely wasted. (Although when running a
> lot of these queries, it appears as though a very small set also increment
> the "lookups" counter, but only a small set, and I haven’t figured out why
> some are special.)
>
> So the question is, why does this facet query have anything to do with the
> filterCache? This causes a huge amount of filterCache churn with no
> apparent benefit.
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>