You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Julie Tibshirani (Jira)" <ji...@apache.org> on 2021/08/03 12:14:00 UTC

[jira] [Created] (LUCENE-10043) Increase default for LRUQueryCache#skipCacheFactor?

Julie Tibshirani created LUCENE-10043:
-----------------------------------------

             Summary: Increase default for LRUQueryCache#skipCacheFactor?
                 Key: LUCENE-10043
                 URL: https://issues.apache.org/jira/browse/LUCENE-10043
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Julie Tibshirani


In LUCENE-9002 we introduced logic to skip caching a clause if it would be too expensive compared to the usual query cost. Specifically, we avoid caching a clause if its cost is estimated to be a factor higher than the lead iterator's:

{code}
              // skip cache operation which would slow query down too much
              if (cost / skipCacheFactor > leadCost) {
                return supplier.get(leadCost);
              }
{code}

Choosing good defaults is hard! We've seen some examples in Elasticsearch where caching a query clause causes a major slowdown, contributing to poor tail latencies. It made me think that the default 'skipCacheFactor' of 250 may be too high -- interpreted simply, this means we'll cache a clause even if it is ~250 times more expensive than running the top-level query on its own. Would it make sense to decrease this to 10 or so? It seems okay to air on the side of less caching for individual clauses, especially since any parent 'BooleanQuery' is already eligible for caching?

As a note, the interpretation "~250 times more expensive than running the top-level query on its own" isn't perfectly accurate. The true cost doesn't dependent on the number of matched documents, but also the cost of matching itself. Making it even more complex, some queries like 'IndexOrDocValuesQuery' have different matching strategies based on whether they're used as a lead iterator or verifier. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org