You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniele Antuzi <da...@gmail.com> on 2021/12/31 16:31:16 UTC

[Solr] does not use the filterCache

Hi,
I was taking a look at the Solr searcher to see how the filterCache is
used:
https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1379-L1398
Reading the code, it turned out that the filterCache is not used if the
sort contains the score or if we don't have any score specified (by
default, it sorts by score).
As far as I know, the filterCache contains an unordered set of documents so
the sort must be calculated after the application of the filter query.
Then, also the score should be computed after the filter query to have a
smaller set of documents.
That being said, I don't understand why Solr does not use the filterCache
if the score is somehow involved in the sort.
In theory, it can

   1. apply the filter query reducing the number of result
   2. computes the score
   3. sort the results

Am I missing something?

Happy new year,
Daniele

RE: [Solr] does not use the filterCache

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

Actually sorting requires that you setup the whole query and all of its iterators. Sure, you could then do stepping over documents not in the cache, but the query has to be executed to actually do the sorting, you can just use the bitset to maybe quicker step forward. You can do this inside Solr: Take the filtercache bitset and apply it as FILTER clause to the main query and assign a good cost, so it leads the iteration. Then it will leap-frog jump over all documents not in the cache. But the actual speed benefit is neglectible du to added execution complexity.

I think Daniele Antuzi is not fully aware how queries and result collection work in Lucene, looks like a misunderstanding on how this differs from databases (databases first collect all results and then sort them like a big array of rows). If you understand that in Lucene all is iterator-based and the sorting works with a priority queue, you quickly understand that the filtercache does not really help for sorting because you need the query's iterators to calculate scores anyways.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Shawn Heisey <ap...@elyograg.org>
> Sent: Tuesday, January 4, 2022 5:21 AM
> To: dev@solr.apache.org
> Subject: Re: [Solr] does not use the filterCache
> 
> On 1/3/2022 5:00 PM, David Smiley wrote:
> > The filter cache contains unsorted lists of docs; an entry ultimately
> > needs to be sorted to what the user wants.  The score in particular
> > requires actually running the query, at which point there isn't a point
> > in using the filter cache.  Well sort of; I could imagine a hybrid to
> > visit only the matching docs but that would add complexity.
> 
> I know that a filter query does not affect the scores in search results.
>   Filters decide which docs in the query result will be
> included/excluded in the final resultset, and do not influence the score.
> 
> Thinking about that ... I can't imagine why sorting by score would
> preclude using the filterCache.  The scores come from the main query,
> not the filters.  But there is likely to be some aspect to Lucene
> internals that I know nothing about -- my knowledge of those internals
> is very limited.
> 
> Thanks,
> Shawn
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org


Re: [Solr] does not use the filterCache

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/3/2022 5:00 PM, David Smiley wrote:
> The filter cache contains unsorted lists of docs; an entry ultimately 
> needs to be sorted to what the user wants.  The score in particular 
> requires actually running the query, at which point there isn't a point 
> in using the filter cache.  Well sort of; I could imagine a hybrid to 
> visit only the matching docs but that would add complexity.

I know that a filter query does not affect the scores in search results. 
  Filters decide which docs in the query result will be 
included/excluded in the final resultset, and do not influence the score.

Thinking about that ... I can't imagine why sorting by score would 
preclude using the filterCache.  The scores come from the main query, 
not the filters.  But there is likely to be some aspect to Lucene 
internals that I know nothing about -- my knowledge of those internals 
is very limited.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org


Re: [Solr] does not use the filterCache

Posted by David Smiley <ds...@apache.org>.
Daniele,

The filter cache contains unsorted lists of docs; an entry ultimately needs
to be sorted to what the user wants.  The score in particular requires
actually running the query, at which point there isn't a point in using the
filter cache.  Well sort of; I could imagine a hybrid to visit only the
matching docs but that would add complexity.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jan 3, 2022 at 2:30 PM Daniele Antuzi <da...@gmail.com>
wrote:

> Hi Mikhail,
> Thanks for your reply.
> Probably I wasn't clear enough, actually, in the piece of code I pointed
> out
> <https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1387-L1398>,
> the searcher decides whether to use (or not use) the filterCache by setting
> the boolean *useFilterCache*.
>
> The searcher will use the filterCache in the search only if
>
>    - the filterCache exists
>    - AND the flags *GET_SCORES* and *NO_CHECK_FILTERCACHE* are not set
>    - AND the parameter *useFilterForSortedQuery* is true (by default is
>    false and I don't really understand why)
>    - AND the sort is not null
>    - AND none of the sort clause contains the score
>
> If I don't mistaken, if the sort is null the resultset is sorted by the
> score.
> So, if the resultset is sorted implicitly or explicitly by score, the
> searcher does not use the filterCache. Does everyone know why?
>
>
>
> Il giorno lun 3 gen 2022 alle ore 16:50 Mikhail Khludnev <mk...@apache.org>
> ha scritto:
>
>> Hi, Adrien. Thanks for forwarding this.
>> Daniele, you pointed to the code which bypasses Lucene searching and just
>> sorts cached docset.
>> Applying filter before searching is done by getProcessedFilter()
>> https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L956
>>
>> Happy New Year!
>>
>> On Mon, Jan 3, 2022 at 5:12 PM Adrien Grand <jp...@gmail.com> wrote:
>>
>>> Hi Daniele,
>>>
>>> This is the Lucene dev list, I'm redirecting your question to
>>> dev@solr.apache.org.
>>>
>>> On Fri, Dec 31, 2021 at 5:35 PM Daniele Antuzi <da...@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> > I was taking a look at the Solr searcher to see how the filterCache is
>>> used:
>>> https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1379-L1398
>>> > Reading the code, it turned out that the filterCache is not used if
>>> the sort contains the score or if we don't have any score specified (by
>>> default, it sorts by score).
>>> > As far as I know, the filterCache contains an unordered set of
>>> documents so the sort must be calculated after the application of the
>>> filter query.
>>> > Then, also the score should be computed after the filter query to have
>>> a smaller set of documents.
>>> > That being said, I don't understand why Solr does not use the
>>> filterCache if the score is somehow involved in the sort.
>>> > In theory, it can
>>> >
>>> > apply the filter query reducing the number of result
>>> > computes the score
>>> > sort the results
>>> >
>>> > Am I missing something?
>>> >
>>> > Happy new year,
>>> > Daniele
>>> >
>>>
>>>
>>> --
>>> Adrien
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>
>>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>

Re: [Solr] does not use the filterCache

Posted by Daniele Antuzi <da...@gmail.com>.
Hi Mikhail,
Thank you for your clarification. It has been really useful.
I understood what my problem was. I used the debugger adding a breakpoint
in the method get() of the CaffeineCache. The execution was blocked only
when the flag useFilterCache == true.
So, the breakpoint not hit plus the name of the flag useFilterCache, lead
me to think that the searcher doesn't use the filterCache if useFilterCache
== false.
Digging more into the code, it turned out that the call getProcessedFilter hits
the filterCache by calling the method computeIfAbsent().
I'll be writing a new blog post about caches in Solr with the main focus on
filterCache. Hopefully, other people may take advantage of my studies.

Thanks again,
Daniele

Il giorno lun 3 gen 2022 alle ore 21:48 Mikhail Khludnev <mk...@apache.org>
ha scritto:

> Daniele,
> if !useFilterCache
>         DocSet qDocSet = getDocListAndSetNC(qr, cmd);
> https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1420
> which calls getProcessedFilter() that hits FilterCache as you expect.
>
> On Mon, Jan 3, 2022 at 10:11 PM Daniele Antuzi <da...@gmail.com>
> wrote:
>
>> Hi Mikhail,
>> Thanks for your reply.
>> Probably I wasn't clear enough, actually, in the piece of code I pointed
>> out
>> <https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1387-L1398>,
>> the searcher decides whether to use (or not use) the filterCache by setting
>> the boolean *useFilterCache*.
>>
>> The searcher will use the filterCache in the search only if
>>
>>    - the filterCache exists
>>    - AND the flags *GET_SCORES* and *NO_CHECK_FILTERCACHE* are not set
>>    - AND the parameter *useFilterForSortedQuery* is true (by default is
>>    false and I don't really understand why)
>>    - AND the sort is not null
>>    - AND none of the sort clause contains the score
>>
>> If I don't mistaken, if the sort is null the resultset is sorted by the
>> score.
>> So, if the resultset is sorted implicitly or explicitly by score, the
>> searcher does not use the filterCache. Does everyone know why?
>>
>>
>>
>> Il giorno lun 3 gen 2022 alle ore 16:50 Mikhail Khludnev <mk...@apache.org>
>> ha scritto:
>>
>>> Hi, Adrien. Thanks for forwarding this.
>>> Daniele, you pointed to the code which bypasses Lucene searching and
>>> just sorts cached docset.
>>> Applying filter before searching is done by getProcessedFilter()
>>> https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L956
>>>
>>> Happy New Year!
>>>
>>> On Mon, Jan 3, 2022 at 5:12 PM Adrien Grand <jp...@gmail.com> wrote:
>>>
>>>> Hi Daniele,
>>>>
>>>> This is the Lucene dev list, I'm redirecting your question to
>>>> dev@solr.apache.org.
>>>>
>>>> On Fri, Dec 31, 2021 at 5:35 PM Daniele Antuzi <
>>>> daniele.antuzi@gmail.com> wrote:
>>>> >
>>>> > Hi,
>>>> > I was taking a look at the Solr searcher to see how the filterCache
>>>> is used:
>>>> https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1379-L1398
>>>> > Reading the code, it turned out that the filterCache is not used if
>>>> the sort contains the score or if we don't have any score specified (by
>>>> default, it sorts by score).
>>>> > As far as I know, the filterCache contains an unordered set of
>>>> documents so the sort must be calculated after the application of the
>>>> filter query.
>>>> > Then, also the score should be computed after the filter query to
>>>> have a smaller set of documents.
>>>> > That being said, I don't understand why Solr does not use the
>>>> filterCache if the score is somehow involved in the sort.
>>>> > In theory, it can
>>>> >
>>>> > apply the filter query reducing the number of result
>>>> > computes the score
>>>> > sort the results
>>>> >
>>>> > Am I missing something?
>>>> >
>>>> > Happy new year,
>>>> > Daniele
>>>> >
>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>>>> For additional commands, e-mail: dev-help@solr.apache.org
>>>>
>>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: [Solr] does not use the filterCache

Posted by Daniele Antuzi <da...@gmail.com>.
Hi Mikhail,
Thanks for your reply.
Probably I wasn't clear enough, actually, in the piece of code I pointed out
<https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1387-L1398>,
the searcher decides whether to use (or not use) the filterCache by setting
the boolean *useFilterCache*.

The searcher will use the filterCache in the search only if

   - the filterCache exists
   - AND the flags *GET_SCORES* and *NO_CHECK_FILTERCACHE* are not set
   - AND the parameter *useFilterForSortedQuery* is true (by default is
   false and I don't really understand why)
   - AND the sort is not null
   - AND none of the sort clause contains the score

If I don't mistaken, if the sort is null the resultset is sorted by the
score.
So, if the resultset is sorted implicitly or explicitly by score, the
searcher does not use the filterCache. Does everyone know why?



Il giorno lun 3 gen 2022 alle ore 16:50 Mikhail Khludnev <mk...@apache.org>
ha scritto:

> Hi, Adrien. Thanks for forwarding this.
> Daniele, you pointed to the code which bypasses Lucene searching and just
> sorts cached docset.
> Applying filter before searching is done by getProcessedFilter()
> https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L956
>
> Happy New Year!
>
> On Mon, Jan 3, 2022 at 5:12 PM Adrien Grand <jp...@gmail.com> wrote:
>
>> Hi Daniele,
>>
>> This is the Lucene dev list, I'm redirecting your question to
>> dev@solr.apache.org.
>>
>> On Fri, Dec 31, 2021 at 5:35 PM Daniele Antuzi <da...@gmail.com>
>> wrote:
>> >
>> > Hi,
>> > I was taking a look at the Solr searcher to see how the filterCache is
>> used:
>> https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1379-L1398
>> > Reading the code, it turned out that the filterCache is not used if the
>> sort contains the score or if we don't have any score specified (by
>> default, it sorts by score).
>> > As far as I know, the filterCache contains an unordered set of
>> documents so the sort must be calculated after the application of the
>> filter query.
>> > Then, also the score should be computed after the filter query to have
>> a smaller set of documents.
>> > That being said, I don't understand why Solr does not use the
>> filterCache if the score is somehow involved in the sort.
>> > In theory, it can
>> >
>> > apply the filter query reducing the number of result
>> > computes the score
>> > sort the results
>> >
>> > Am I missing something?
>> >
>> > Happy new year,
>> > Daniele
>> >
>>
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> For additional commands, e-mail: dev-help@solr.apache.org
>>
>>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: [Solr] does not use the filterCache

Posted by Mikhail Khludnev <mk...@apache.org>.
Hi, Adrien. Thanks for forwarding this.
Daniele, you pointed to the code which bypasses Lucene searching and just
sorts cached docset.
Applying filter before searching is done by getProcessedFilter()
https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L956

Happy New Year!

On Mon, Jan 3, 2022 at 5:12 PM Adrien Grand <jp...@gmail.com> wrote:

> Hi Daniele,
>
> This is the Lucene dev list, I'm redirecting your question to
> dev@solr.apache.org.
>
> On Fri, Dec 31, 2021 at 5:35 PM Daniele Antuzi <da...@gmail.com>
> wrote:
> >
> > Hi,
> > I was taking a look at the Solr searcher to see how the filterCache is
> used:
> https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1379-L1398
> > Reading the code, it turned out that the filterCache is not used if the
> sort contains the score or if we don't have any score specified (by
> default, it sorts by score).
> > As far as I know, the filterCache contains an unordered set of documents
> so the sort must be calculated after the application of the filter query.
> > Then, also the score should be computed after the filter query to have a
> smaller set of documents.
> > That being said, I don't understand why Solr does not use the
> filterCache if the score is somehow involved in the sort.
> > In theory, it can
> >
> > apply the filter query reducing the number of result
> > computes the score
> > sort the results
> >
> > Am I missing something?
> >
> > Happy new year,
> > Daniele
> >
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev

Re: [Solr] does not use the filterCache

Posted by Adrien Grand <jp...@gmail.com>.
Hi Daniele,

This is the Lucene dev list, I'm redirecting your question to
dev@solr.apache.org.

On Fri, Dec 31, 2021 at 5:35 PM Daniele Antuzi <da...@gmail.com> wrote:
>
> Hi,
> I was taking a look at the Solr searcher to see how the filterCache is used: https://github.com/apache/solr/blob/c2db3a943e665cfb39e9ea53640be40cf2c09fbc/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1379-L1398
> Reading the code, it turned out that the filterCache is not used if the sort contains the score or if we don't have any score specified (by default, it sorts by score).
> As far as I know, the filterCache contains an unordered set of documents so the sort must be calculated after the application of the filter query.
> Then, also the score should be computed after the filter query to have a smaller set of documents.
> That being said, I don't understand why Solr does not use the filterCache if the score is somehow involved in the sort.
> In theory, it can
>
> apply the filter query reducing the number of result
> computes the score
> sort the results
>
> Am I missing something?
>
> Happy new year,
> Daniele
>


-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org