You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Tom Burton-West <tb...@umich.edu> on 2014/09/24 20:10:35 UTC

queryResultMaxDocsCached vs. queryResultWindowSize

Hello,

No response on the Solr user list so I thought I would try the dev list.


queryResultWindowSize sets the number of documents  to cache for each query
in the queryResult cache.    So if you normally output 10 results per page,
and users don't go beyond page 3 of results, you could set
queryResultWindowSize to 30 and the second and third page requests will
read from cache, not from disk.  This is well documented in both the Solr
example solrconfig.xml file and the Solr documentation.

However, the example in solrconfig.xml and the documentation in the
reference manual for Solr 4.10 say that queryResultMaxDocsCached :

"sets the maximum number of documents to cache for any entry in the
queryResultCache".

Looking at the code  it appears that the queryResultMaxDocsCached parameter
actually tells Solr not to cache any results list that has a size  over
 queryResultMaxDocsCached:.

From:  SolrIndexSearcher.getDocListC
// lastly, put the superset in the cache if the size is less than or equal
    // to queryResultMaxDocsCached
    if (key != null && superset.size() <= queryResultMaxDocsCached &&
!qr.isPartialResults()) {
      queryResultCache.put(key, superset);
    }

Deciding whether or not to cache a DocList if its size is over N (where N =
queryResultMaxDocsCached) is very different than caching only N items from
the DocList which is what the current documentation (and the variable name)
implies.

Looking at the JIRA issue https://issues.apache.org/jira/browse/SOLR-291
the original intent was to control memory use and the variable name
originally suggested was  "noCacheIfLarger"

Can someone please let me know if it is true that the
queryResultMaxDocsCached parameter actually tells Solr not to cache any
results list that contains over the  queryResultMaxDocsCached?

If so, I will add a comment to the Cwiki doc and open a JIRA and submit a
patch to the example file.

I tried to find a test case that excercises SolrIndexSearcher.getDocListC
so I could see how  queryResultWindowSize or queryResultMaxDocsCached
actually work in the debugger but could not find a test case.  Could
someone please point me to a good test case that either excercises
SolrIndexSearcher.getDocListC or would be a good starting point for writing
one?


Tom



---------------------------

http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/example/solr/collection1/conf/solrconfig.xml?revision=1624269&view=markup

635     <!-- Maximum number of documents to cache for any entry in the
636 queryResultCache.
637 -->
638 <queryResultMaxDocsCached>200</queryResultMaxDocsCached>

Re: queryResultMaxDocsCached vs. queryResultWindowSize

Posted by Tom Burton-West <tb...@umich.edu>.

Thanks for your help Yonik and Tomas,

I had several mistaken assumptions about how caching worked which were
resolved by walking through the code in the debugger after reading your
replies.

Tom


On Fri, Sep 26, 2014 at 4:55 PM, Yonik Seeley <yo...@heliosearch.com> wrote:

> On Fri, Sep 26, 2014 at 4:38 PM, Tom Burton-West <tb...@umich.edu>
> wrote:
> > Hi Yonik,
> >
> > I'm still confused.
> >
> >  suspect don't understand how paging and caching interact.  I probably
> need
> > to walk through the code.  Is there a unit test that exercises
> > SolrIndexSearcher.getDocListC or a good unit test to use as a base to
> write
> > one?
> >
> >
> > Part of what confuses me is whether what gets cached always starts at
> row 1
> > of results.
>
> Yes, we always cache from the first row.
> Asking for rows 91-100 requires collecting 1-100 (and it's the latter
> we cache - ignoring deep paging).
> It's also just ids (and optionally scores) that are cached... so
> either 4 bytes or 8 bytes per document cached, depending on if you ask
> for scores back.
>
> queryWindowSize just rounds up the upper bound.
>
> > I'll try to explain my confusion.
> > Using the defaults in the solrconfig example:
> > <queryResultWindowSize>20</queryResultWindowSize>
> > <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
> >
> > If I query for start=0, rows =10  Solr fetches 20 results and caches
> them.
> > If I query for start =11 rows =10 Solr read rows 11-20 from cache
>
> Correct.
>
> > What happens when I query for start = 21 rows= 10?
> > I thought that Solr would then fetch rows 21-40 into the
> queryResultCache.
> > Is this wrong?
>
> It will result in a cache miss and we'll collect 0-40 and cache that.
>
> > If I query for start =195 rows =10  does Solr cache rows 195-200 but go
> to
> > disk for rows over 200 (queryResultMaxDocsCached=200)?   Or does Solr
> skip
> > caching altogether for rows over 200
>
> Probably the latter... it's an edge case so I'd have to check the code
> to know for sure if the check is pre or post rounding up.
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: queryResultMaxDocsCached vs. queryResultWindowSize

Posted by Yonik Seeley <yo...@heliosearch.com>.

On Fri, Sep 26, 2014 at 4:38 PM, Tom Burton-West <tb...@umich.edu> wrote:
> Hi Yonik,
>
> I'm still confused.
>
>  suspect don't understand how paging and caching interact.  I probably need
> to walk through the code.  Is there a unit test that exercises
> SolrIndexSearcher.getDocListC or a good unit test to use as a base to write
> one?
>
>
> Part of what confuses me is whether what gets cached always starts at row 1
> of results.

Yes, we always cache from the first row.
Asking for rows 91-100 requires collecting 1-100 (and it's the latter
we cache - ignoring deep paging).
It's also just ids (and optionally scores) that are cached... so
either 4 bytes or 8 bytes per document cached, depending on if you ask
for scores back.

queryWindowSize just rounds up the upper bound.

> I'll try to explain my confusion.
> Using the defaults in the solrconfig example:
> <queryResultWindowSize>20</queryResultWindowSize>
> <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
>
> If I query for start=0, rows =10  Solr fetches 20 results and caches them.
> If I query for start =11 rows =10 Solr read rows 11-20 from cache

Correct.

> What happens when I query for start = 21 rows= 10?
> I thought that Solr would then fetch rows 21-40 into the queryResultCache.
> Is this wrong?

It will result in a cache miss and we'll collect 0-40 and cache that.

> If I query for start =195 rows =10  does Solr cache rows 195-200 but go to
> disk for rows over 200 (queryResultMaxDocsCached=200)?   Or does Solr skip
> caching altogether for rows over 200

Probably the latter... it's an edge case so I'd have to check the code
to know for sure if the check is pre or post rounding up.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: queryResultMaxDocsCached vs. queryResultWindowSize

Posted by Tom Burton-West <tb...@umich.edu>.

Hi Yonik,

I'm still confused.

 suspect don't understand how paging and caching interact.  I probably need
to walk through the code.  Is there a unit test that exercises
SolrIndexSearcher.getDocListC
or a good unit test to use as a base to write one?

Part of what confuses me is whether what gets cached always starts at row 1
of results.  I did not think this was true, but your example of start=10000
rows = 10 (ie rows 10000-through 10010) triggering the
queryResultMacDocsCached limit of 200 makes it sound like the cache always
starts at row 1.  I would have thought that a request for start= 10,000
 rows=10,010 would result in Solr caching rows 10,000-10,020.

I'll try to explain my confusion.
Using the defaults in the solrconfig example:
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>

If I query for start=0, rows =10  Solr fetches 20 results and caches them.
If I query for start =11 rows =10 Solr read rows 11-20 from cache
What happens when I query for start = 21 rows= 10?
I thought that Solr would then fetch rows 21-40 into the queryResultCache.
Is this wrong?

If I query for start =195 rows =10  does Solr cache rows 195-200 but go to
disk for rows over 200 (queryResultMaxDocsCached=200)?   Or does Solr skip
caching altogether for rows over 200

Tom

On Wed, Sep 24, 2014 at 7:12 PM, Yonik Seeley <yo...@heliosearch.com> wrote:

> On Wed, Sep 24, 2014 at 5:27 PM, Tomás Fernández Löbbe
> <to...@gmail.com> wrote:
> > I think you are right. I think the name is this because it’s considering
> a
> > series of queries paging a result. The first X pages are going to be
> cached,
> > but once the limit is reached, no further pages are and the last superset
> > that fitted remains in cache.
>
> I was confused about the confusion ;-)  But your summary seems correct.
>
> queryResultWindowSize rounds up to a multiple of the window size for
> caching purposes.
> So if you ask for top 10, and the queryResultWindowSize is 20, then
> the top 20 will be cached (so if a user hits "next" to get to the next
> 10, it will still result in a cache hit).
>
> queryResultMaxDocsCached sets a limit beyond which the resulting docs
> aren't cached (so if a user asks for docs 10000 through 10010, we skip
> caching logic).
>
> -Yonik
> http://heliosearch.org - native code faceting, facet functions,
> sub-facets, off-heap data
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: queryResultMaxDocsCached vs. queryResultWindowSize

Posted by Yonik Seeley <yo...@heliosearch.com>.

On Wed, Sep 24, 2014 at 5:27 PM, Tomás Fernández Löbbe
<to...@gmail.com> wrote:
> I think you are right. I think the name is this because it’s considering a
> series of queries paging a result. The first X pages are going to be cached,
> but once the limit is reached, no further pages are and the last superset
> that fitted remains in cache.

I was confused about the confusion ;-)  But your summary seems correct.

queryResultWindowSize rounds up to a multiple of the window size for
caching purposes.
So if you ask for top 10, and the queryResultWindowSize is 20, then
the top 20 will be cached (so if a user hits "next" to get to the next
10, it will still result in a cache hit).

queryResultMaxDocsCached sets a limit beyond which the resulting docs
aren't cached (so if a user asks for docs 10000 through 10010, we skip
caching logic).

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: queryResultMaxDocsCached vs. queryResultWindowSize

Posted by Tomás Fernández Löbbe <to...@gmail.com>.

I think you are right. I think the name is this because it’s considering a
series of queries paging a result. The first X pages are going to be
cached, but once the limit is reached, no further pages are and the last
superset that fitted remains in cache. At least that’s my understanding.
After a quick look, I couldn’t find a test case for this either.

Tomás

On Wed, Sep 24, 2014 at 11:10 AM, Tom Burton-West <tb...@umich.edu>
wrote:

> Hello,
>
> No response on the Solr user list so I thought I would try the dev list.
>
>
> queryResultWindowSize sets the number of documents  to cache for each
> query in the queryResult cache.    So if you normally output 10 results per
> page, and users don't go beyond page 3 of results, you could set
> queryResultWindowSize to 30 and the second and third page requests will
> read from cache, not from disk.  This is well documented in both the Solr
> example solrconfig.xml file and the Solr documentation.
>
> However, the example in solrconfig.xml and the documentation in the
> reference manual for Solr 4.10 say that queryResultMaxDocsCached :
>
> "sets the maximum number of documents to cache for any entry in the
> queryResultCache".
>
> Looking at the code  it appears that the queryResultMaxDocsCached
> parameter actually tells Solr not to cache any results list that has a size
>  over  queryResultMaxDocsCached:.
>
> From:  SolrIndexSearcher.getDocListC
> // lastly, put the superset in the cache if the size is less than or equal
>     // to queryResultMaxDocsCached
>     if (key != null && superset.size() <= queryResultMaxDocsCached &&
> !qr.isPartialResults()) {
>       queryResultCache.put(key, superset);
>     }
>
> Deciding whether or not to cache a DocList if its size is over N (where N
> = queryResultMaxDocsCached) is very different than caching only N items
> from the DocList which is what the current documentation (and the variable
> name) implies.
>
> Looking at the JIRA issue https://issues.apache.org/jira/browse/SOLR-291
> the original intent was to control memory use and the variable name
> originally suggested was  "noCacheIfLarger"
>
> Can someone please let me know if it is true that the
> queryResultMaxDocsCached parameter actually tells Solr not to cache any
> results list that contains over the  queryResultMaxDocsCached?
>
> If so, I will add a comment to the Cwiki doc and open a JIRA and submit a
> patch to the example file.
>
> I tried to find a test case that excercises SolrIndexSearcher.getDocListC
> so I could see how  queryResultWindowSize or queryResultMaxDocsCached
> actually work in the debugger but could not find a test case.  Could
> someone please point me to a good test case that either excercises
> SolrIndexSearcher.getDocListC or would be a good starting point for writing
> one?
>
>
> Tom
>
>
>
> ---------------------------
>
>
> http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/example/solr/collection1/conf/solrconfig.xml?revision=1624269&view=markup
>
> 635     <!-- Maximum number of documents to cache for any entry in the
> 636 queryResultCache.
> 637 -->
> 638 <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
>