You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Gautam Worah (Jira)" <ji...@apache.org> on 2021/09/01 02:12:00 UTC
[jira] [Commented] (LUCENE-10068) Switch to a "double barrel" HPPC cache for the taxonomy LRU cache

    [ https://issues.apache.org/jira/browse/LUCENE-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407758#comment-17407758 ] 

Gautam Worah commented on LUCENE-10068:
---------------------------------------

I was initially trying to benchmark the hit rate of the category cache but [~mikemccand]  suggested that I plainly disable it and see if I affects benchmarks. [Here|https://github.com/gautamworah96/lucene/tree/testtaxocachehitrate] is a branch that does that.

Full Results are attached to this JIRA in a file

TLDR, we don't see any regression. I guess we can either increase the size of the cache and try experimenting again? Maybe 10k? or else just remove it entirely (preferred).

Makes me wonder if the ordinal cache is needed or no :/








> Switch to a "double barrel" HPPC cache for the taxonomy LRU cache
> -----------------------------------------------------------------
>
>                 Key: LUCENE-10068
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10068
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 8.8.1
>            Reporter: Gautam Worah
>            Priority: Minor
>         Attachments: disable_taxo_category_cache_benchmark
>
>
> While working on an unrelated getBulkPath API [PR|https://github.com/apache/lucene/pull/179], [~mikemccand] and I came across a nice optimization that could be made to the taxonomy cache.
> The taxonomy cache today caches frequently used ordinals and their corresponding FacetLabels. It uses the existing LRUHashMap (backed by a LinkedList) class for its implementation.
> This implementation performs sub optimally when it has a large number of threads accessing it, and consumes a large amount of RAM.
> [~mikemccand] suggested the idea of a two array backed HPPC int->FacetLabel cache. The basic idea behind the cache being:
>  # We use two hashmaps primary and secondary.
>  # In case of a cache miss in the primary and a cache hit in the secondary, we add the key to the primary map as well.
>  # In case of a cache miss in both the maps, we add it to the primary map.
>  # When we reach (make this check each time we insert?) a large number of elements in say the primary cache, (say larger than the existing {color:#871094}DEFAULT_CACHE_VALUE{color}=4000), we dump the secondary map and copy all the values of the primary map into it.
> The idea was originally explained in [this|https://github.com/apache/lucene/pull/179#discussion_r692907559] comment.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org