You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Charlie Hull <ch...@flax.co.uk> on 2014/10/01 10:24:00 UTC
Re: Filter cache pollution during sharded edismax queries
On 30/09/2014 22:25, Erick Erickson wrote:
> Just from a 20,000 ft. view, using the filterCache this way seems...odd.
>
> +1 for using a different cache, but that's being quite unfamiliar with the
> code.
Here's a quick update:
1. LFUCache performs worse so we returned to LRUCache
2. Making the cache smaller than the default 512 reduced performance.
3. Raising the cache size to 2048 didn't seem to have a significant
effect on performance but did reduce CPU load significantly. This may
help our client as they can reduce their system spec considerably.
We're continuing to test with our client, but the upshot is that even if
you think you don't need the filter cache, if you're doing distributed
faceting you probably do, and you should size it based on
experimentation. In our case there is a single filter but the cache
needs to be considerably larger than that!
Cheers
Charlie
>
> On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward <al...@flax.co.uk> wrote:
>
>>
>>>
>>>> Once all the facets have been gathered, the co-ordinating node then asks
>>>> the subnodes for an exact count for the final top-N facets,
>>>
>>>
>>> What's the point to refine these counts? I've thought that it make sense
>>> only for facet.limit ed requests. Is it correct statement? can those who
>>> suffer from the low performance, just unlimit facet.limit to avoid that
>>> distributed hop?
>>
>> Presumably yes, but if you've got a sufficiently high cardinality field
>> then any gains made by missing out the hop will probably be offset by
>> having to stream all the return values back again.
>>
>> Alan
>>
>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>> <mk...@griddynamics.com>
>>
>>
>
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk
Re: Filter cache pollution during sharded edismax queries
Posted by Charlie Hull <ch...@flax.co.uk>.
On 01/10/2014 09:55, jim ferenczi wrote:
> I think you should test with facet.shard.limit=-1 this will disallow the
> limit for the facet on the shards and remove the needs for facet
> refinements. I bet that returning every facet with a count greater than 0
> on internal queries is cheaper than using the filter cache to handle a lot
> of refinements.
I'm happy to report that in our case setting facet.limit=-1 has a
significant impact on performance, cache hit ratios and reduced CPU
load. Thanks to all who replied!
Cheers
Charlie
Flax
>
> Jim
>
> 2014-10-01 10:24 GMT+02:00 Charlie Hull <ch...@flax.co.uk>:
>
>> On 30/09/2014 22:25, Erick Erickson wrote:
>>
>>> Just from a 20,000 ft. view, using the filterCache this way seems...odd.
>>>
>>> +1 for using a different cache, but that's being quite unfamiliar with the
>>> code.
>>>
>>
>> Here's a quick update:
>>
>> 1. LFUCache performs worse so we returned to LRUCache
>> 2. Making the cache smaller than the default 512 reduced performance.
>> 3. Raising the cache size to 2048 didn't seem to have a significant effect
>> on performance but did reduce CPU load significantly. This may help our
>> client as they can reduce their system spec considerably.
>>
>> We're continuing to test with our client, but the upshot is that even if
>> you think you don't need the filter cache, if you're doing distributed
>> faceting you probably do, and you should size it based on experimentation.
>> In our case there is a single filter but the cache needs to be considerably
>> larger than that!
>>
>> Cheers
>>
>> Charlie
>>
>>
>>
>>> On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward <al...@flax.co.uk> wrote:
>>>
>>>
>>>>
>>>>> Once all the facets have been gathered, the co-ordinating node then
>>>>>> asks
>>>>>> the subnodes for an exact count for the final top-N facets,
>>>>>>
>>>>>
>>>>>
>>>>> What's the point to refine these counts? I've thought that it make sense
>>>>> only for facet.limit ed requests. Is it correct statement? can those who
>>>>> suffer from the low performance, just unlimit facet.limit to avoid that
>>>>> distributed hop?
>>>>>
>>>>
>>>> Presumably yes, but if you've got a sufficiently high cardinality field
>>>> then any gains made by missing out the hop will probably be offset by
>>>> having to stream all the return values back again.
>>>>
>>>> Alan
>>>>
>>>>
>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>>
>>>>> <http://www.griddynamics.com>
>>>>> <mk...@griddynamics.com>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>>
>> tel/fax: +44 (0)8700 118334
>> mobile: +44 (0)7767 825828
>> web: www.flax.co.uk
>>
>
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk
Re: Filter cache pollution during sharded edismax queries
Posted by jim ferenczi <ji...@gmail.com>.
I think you should test with facet.shard.limit=-1 this will disallow the
limit for the facet on the shards and remove the needs for facet
refinements. I bet that returning every facet with a count greater than 0
on internal queries is cheaper than using the filter cache to handle a lot
of refinements.
Jim
2014-10-01 10:24 GMT+02:00 Charlie Hull <ch...@flax.co.uk>:
> On 30/09/2014 22:25, Erick Erickson wrote:
>
>> Just from a 20,000 ft. view, using the filterCache this way seems...odd.
>>
>> +1 for using a different cache, but that's being quite unfamiliar with the
>> code.
>>
>
> Here's a quick update:
>
> 1. LFUCache performs worse so we returned to LRUCache
> 2. Making the cache smaller than the default 512 reduced performance.
> 3. Raising the cache size to 2048 didn't seem to have a significant effect
> on performance but did reduce CPU load significantly. This may help our
> client as they can reduce their system spec considerably.
>
> We're continuing to test with our client, but the upshot is that even if
> you think you don't need the filter cache, if you're doing distributed
> faceting you probably do, and you should size it based on experimentation.
> In our case there is a single filter but the cache needs to be considerably
> larger than that!
>
> Cheers
>
> Charlie
>
>
>
>> On Tue, Sep 30, 2014 at 1:53 PM, Alan Woodward <al...@flax.co.uk> wrote:
>>
>>
>>>
>>>> Once all the facets have been gathered, the co-ordinating node then
>>>>> asks
>>>>> the subnodes for an exact count for the final top-N facets,
>>>>>
>>>>
>>>>
>>>> What's the point to refine these counts? I've thought that it make sense
>>>> only for facet.limit ed requests. Is it correct statement? can those who
>>>> suffer from the low performance, just unlimit facet.limit to avoid that
>>>> distributed hop?
>>>>
>>>
>>> Presumably yes, but if you've got a sufficiently high cardinality field
>>> then any gains made by missing out the hop will probably be offset by
>>> having to stream all the return values back again.
>>>
>>> Alan
>>>
>>>
>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>> <mk...@griddynamics.com>
>>>>
>>>
>>>
>>>
>>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile: +44 (0)7767 825828
> web: www.flax.co.uk
>