You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Charlie Hull <ch...@flax.co.uk> on 2014/10/01 10:24:00 UTC

Re: Filter cache pollution during sharded edismax queries

On 30/09/2014 22:25, Erick Erickson wrote:
> Just from a 20,000 ft. view, using the filterCache this way seems...odd.
>
> +1 for using a different cache, but that's being quite unfamiliar with the
> code.

Here's a quick update:

1. LFUCache performs worse so we returned to LRUCache
2. Making the cache smaller than the default 512 reduced performance.
3. Raising the cache size to 2048 didn't seem to have a significant 
effect on performance but did reduce CPU load significantly. This may 
help our client as they can reduce their system spec considerably.

We're continuing to test with our client, but the upshot is that even if 
you think you don't need the filter cache, if you're doing distributed 
faceting you probably do, and you should size it based on 
experimentation. In our case there is a single filter but the cache 
needs to be considerably larger than that!

Cheers

Charlie

>
> On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward <al...@flax.co.uk> wrote:
>
>>
>>>
>>>> Once all the facets have been gathered, the co-ordinating node then asks
>>>> the subnodes for an exact count for the final top-N facets,
>>>
>>>
>>> What's the point to refine these counts? I've thought that it make sense
>>> only for facet.limit ed requests. Is it correct statement? can those who
>>> suffer from the low performance, just unlimit  facet.limit to avoid that
>>> distributed hop?
>>
>> Presumably yes, but if you've got a sufficiently high cardinality field
>> then any gains made by missing out the hop will probably be offset by
>> having to stream all the return values back again.
>>
>> Alan
>>
>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>> <mk...@griddynamics.com>
>>
>>
>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Filter cache pollution during sharded edismax queries

Posted by Charlie Hull <ch...@flax.co.uk>.
On 01/10/2014 09:55, jim ferenczi wrote:
> I think you should test with facet.shard.limit=-1 this will disallow the
> limit for the facet on the shards and remove the needs for facet
> refinements. I bet that returning every facet with a count greater than 0
> on internal queries is cheaper than using the filter cache to handle a lot
> of refinements.

I'm happy to report that in our case setting facet.limit=-1 has a 
significant impact on performance, cache hit ratios and reduced CPU 
load. Thanks to all who replied!

Cheers

Charlie
Flax
>
> Jim
>
> 2014-10-01 10:24 GMT+02:00 Charlie Hull <ch...@flax.co.uk>:
>
>> On 30/09/2014 22:25, Erick Erickson wrote:
>>
>>> Just from a 20,000 ft. view, using the filterCache this way seems...odd.
>>>
>>> +1 for using a different cache, but that's being quite unfamiliar with the
>>> code.
>>>
>>
>> Here's a quick update:
>>
>> 1. LFUCache performs worse so we returned to LRUCache
>> 2. Making the cache smaller than the default 512 reduced performance.
>> 3. Raising the cache size to 2048 didn't seem to have a significant effect
>> on performance but did reduce CPU load significantly. This may help our
>> client as they can reduce their system spec considerably.
>>
>> We're continuing to test with our client, but the upshot is that even if
>> you think you don't need the filter cache, if you're doing distributed
>> faceting you probably do, and you should size it based on experimentation.
>> In our case there is a single filter but the cache needs to be considerably
>> larger than that!
>>
>> Cheers
>>
>> Charlie
>>
>>
>>
>>> On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward <al...@flax.co.uk> wrote:
>>>
>>>
>>>>
>>>>>   Once all the facets have been gathered, the co-ordinating node then
>>>>>> asks
>>>>>> the subnodes for an exact count for the final top-N facets,
>>>>>>
>>>>>
>>>>>
>>>>> What's the point to refine these counts? I've thought that it make sense
>>>>> only for facet.limit ed requests. Is it correct statement? can those who
>>>>> suffer from the low performance, just unlimit  facet.limit to avoid that
>>>>> distributed hop?
>>>>>
>>>>
>>>> Presumably yes, but if you've got a sufficiently high cardinality field
>>>> then any gains made by missing out the hop will probably be offset by
>>>> having to stream all the return values back again.
>>>>
>>>> Alan
>>>>
>>>>
>>>>   --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>>
>>>>> <http://www.griddynamics.com>
>>>>> <mk...@griddynamics.com>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>>
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.flax.co.uk
>>
>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Filter cache pollution during sharded edismax queries

Posted by jim ferenczi <ji...@gmail.com>.
I think you should test with facet.shard.limit=-1 this will disallow the
limit for the facet on the shards and remove the needs for facet
refinements. I bet that returning every facet with a count greater than 0
on internal queries is cheaper than using the filter cache to handle a lot
of refinements.

Jim

2014-10-01 10:24 GMT+02:00 Charlie Hull <ch...@flax.co.uk>:

> On 30/09/2014 22:25, Erick Erickson wrote:
>
>> Just from a 20,000 ft. view, using the filterCache this way seems...odd.
>>
>> +1 for using a different cache, but that's being quite unfamiliar with the
>> code.
>>
>
> Here's a quick update:
>
> 1. LFUCache performs worse so we returned to LRUCache
> 2. Making the cache smaller than the default 512 reduced performance.
> 3. Raising the cache size to 2048 didn't seem to have a significant effect
> on performance but did reduce CPU load significantly. This may help our
> client as they can reduce their system spec considerably.
>
> We're continuing to test with our client, but the upshot is that even if
> you think you don't need the filter cache, if you're doing distributed
> faceting you probably do, and you should size it based on experimentation.
> In our case there is a single filter but the cache needs to be considerably
> larger than that!
>
> Cheers
>
> Charlie
>
>
>
>> On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward <al...@flax.co.uk> wrote:
>>
>>
>>>
>>>>  Once all the facets have been gathered, the co-ordinating node then
>>>>> asks
>>>>> the subnodes for an exact count for the final top-N facets,
>>>>>
>>>>
>>>>
>>>> What's the point to refine these counts? I've thought that it make sense
>>>> only for facet.limit ed requests. Is it correct statement? can those who
>>>> suffer from the low performance, just unlimit  facet.limit to avoid that
>>>> distributed hop?
>>>>
>>>
>>> Presumably yes, but if you've got a sufficiently high cardinality field
>>> then any gains made by missing out the hop will probably be offset by
>>> having to stream all the return values back again.
>>>
>>> Alan
>>>
>>>
>>>  --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>> <mk...@griddynamics.com>
>>>>
>>>
>>>
>>>
>>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>