You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Adrien Grand (Jira)" <ji...@apache.org> on 2021/11/24 16:50:00 UTC

[jira] [Commented] (LUCENE-10235) LRUQueryCache should not count never-cacheable queries as a miss

    [ https://issues.apache.org/jira/browse/LUCENE-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448726#comment-17448726 ] 

Adrien Grand commented on LUCENE-10235:
---------------------------------------

I agree that it is a bit unintuitive, but I feel like it's more due to the nature of this cache - which is quite untypical since it waits for queries to be used frequently before putting them into the cache - than to the fact that the miss count is incorrectly reported since every miss count maps to a "get" into the cache?

For this cache it's probably more interesting to compare the hit count with the number of times we put something into the cache (LRUQueryCache#onDocIdSetCache). Am I getting it right that you are thinking of `ignored` as being `missCount - docIdSetCacheCount`, ie. the number of times that we didn't find an entry in the cache and yet did not decide to create a cache entry?


> LRUQueryCache should not count never-cacheable queries as a miss
> ----------------------------------------------------------------
>
>                 Key: LUCENE-10235
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10235
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Yannick Welsch
>            Priority: Minor
>
> Hit and miss counts of a cache are typically used to check how effective a caching layer is. While looking at a system that exhibited a very high miss to hit ratio, I took a closer look at Lucene's LRUQueryCache and noticed that it's treating the handling of queries as a miss that it would never ever even think about caching in the first place. (e.g. TermQuery and others mentioned in UsageTrackingQueryCachingPolicy.shouldNeverCache).
> The reason these are counted as a miss is that LRUQueryCache (scorerSupplier and bulkScorer methods) first does a lookup on the cache, incrementing hit or miss counters, and upon miss, only then checks QueryCachingPolicy.shouldCache to decide whether that query should be put into the cache.
> This issue is made more complex by the fact that QueryCachingPolicy.shouldCache is a stateful method, and cacheability of a query can change over time (e.g. after appearing N times).
> I'm opening this issue to discuss whether others also feel that the current way of accounting misses is unintuitive / confusing. I would also like to put forward a proposal to:
>  * generalize the boolean QueryCachingPolicy.shouldCache method to return an enum instead (one of YES, NOT_RIGHT_NOW, NEVER), and only account queries that are (eventually) cacheable and not in the cache as a miss,
>  * optionally introduce another metric for queries that are never cacheable, e.g. "ignored", and
>  * optionally refine miss count into a count for items that are cacheable right away, and those that will eventually be cacheable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org