You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2006/06/27 03:07:52 UTC

autowarmCount usefulness

I'm trying to fully understand the LRUCache and the autowarmCount  
parameter.   Why does it make sense to auto-warm filters and query  
results?   In my case, if a new document is added it may invalidate  
many filters, and it would require knowing the details of the  
documents added/removed to know which caches could be copied.

Can someone shed light on the scenarios where blindly copying over  
any cached filters (or query results) makes sense?

Thanks,
	Erik


Re: autowarmCount usefulness

Posted by Chris Hostetter <ho...@fucit.org>.
: How do I use LRUCache as a custom user cache to deal with cache
: misses and look up data dynamically then?   It seems to me that
: LRUCache.get() should deal with misses itself and call the
: regenerator if the key is not found.  But rather SolrIndexSearcher
: deals with this.  If I define a custom cache as an LRUCache with a
: custom regenerator, it looks like I have to add a bit more custom
: code around where I use that cache to deal with misses.  Does it make
: sense that LRUCache would pass through to a regenerator on .get() if
: the key is not found?

you are correct that need a bit of wrapper code to deal with cache misses,
and your right: in many cases it might make sense for the Cache to use the
regenerator to fetch it for you -- but it doesn't currently work that way.

I wasn't all that fond of it when Yonik built it, but i've since comes to
realize there are compelling reasosn for it (besides the obvioius "Yonik
was mad at me that day and wanted to make me have to write more code.")

The first is that it gives you the opportunity to make a concious choice
wether you want to bother caching the object when it's a cache miss,
independent of what the cache implimentation/configuration might be.  your
plugin can say something like...

    Object data = cache.get(key)
    if (null == data) {
       data = computeData();
       if (dataIsWorthCaching(data)) { cache.put(key,data); }
    }

...where the dataIsWorthCaching method might look at lots of factors that
wouldn't make sense to put into a generic cache replacement policy ... i
don't have any practical examples of this, but I'm sure some exist.

The second, and more cocretely is that the method for generating the data
might be completely independent of the method of "regenerating" the cache
entry.  In the case of the SolrPluginUtils.IdentityRegenerator, that's
usefull when the data is completley independent of the Searcher.  I have a
plugin which actively generates some data on a Cache miss, and lets that
regenerateor just keep reusing the same object.  when the plugin is
notified that the data has changed, it "puts" the new data in the current
cache.

(Bear in mind however, that the Cache API does give every cache instance a
refrence to the CacheRegenerator when initializing it, so a Cache
implimentation could be constructed which does what you describe)


: "LRU" abbreviation confuses me.... I see "least recently used" when I
: see that, but it really means "last recently used" within Solr.  :)

Acctually it is "LEAST recently used" -- just not in the context that
makes sense ...

One of my greatest disappointments with the Computer Science field is that
at some point in history some jackass decided that the best way to refer
to caching implimentations was by their replacement strategy -- thus an
LRUCache is one that when full, will replace the "least recently used"
item with the item being added; meanwhile an LFUCache will replace the
"Least Frequently Used", etc...  Aparently whatever PhD we have to thank
for this convention never considered the fact that when designing cache
replacement strategies, it would be neccessary to rank the cached items to
decide which one can be replaced -- and that maybe, just *MAYBE*, it would
make sense if the caches where named after what they value as important,
and not what they consider replacable.



-Hoss


Re: autowarmCount usefulness

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jun 26, 2006, at 10:38 PM, Chris Hostetter wrote:
> : I'm trying to fully understand the LRUCache and the autowarmCount
> : parameter.   Why does it make sense to auto-warm filters and query
> : results?   In my case, if a new document is added it may invalidate
> : many filters, and it would require knowing the details of the
> : documents added/removed to know which caches could be copied.
> :
> : Can someone shed light on the scenarios where blindly copying over
> : any cached filters (or query results) makes sense?
>
> Autowarming of the filterCache and queryResultCache doesn't just  
> copy the
> cached values -- it reexecutes the queries used as the keys for those
> caches and generates new DocSet/DocLists using the *new* searcher,  
> before
> that searcher is made available to threads serving queries over HTTP.

Ah, that was the secret sauce I was missing.  I'm still making my way  
through the codebase understanding how it is put together, and now I  
see the regenerators in the SolrIndexSearcher for these built-in caches.

> For named User caches, autowarming doesn't work at all unless you've
> specified a regenerator -- which can do whatever it wants using the  
> new
> searcher and the information from the old cache.

How do I use LRUCache as a custom user cache to deal with cache  
misses and look up data dynamically then?   It seems to me that  
LRUCache.get() should deal with misses itself and call the  
regenerator if the key is not found.  But rather SolrIndexSearcher  
deals with this.  If I define a custom cache as an LRUCache with a  
custom regenerator, it looks like I have to add a bit more custom  
code around where I use that cache to deal with misses.  Does it make  
sense that LRUCache would pass through to a regenerator on .get() if  
the key is not found?

> The reason autowarming is configured using an autowarmCount is so  
> you can
> control just how much effort Solr should put into the autowarming  
> of the
> new cache ... if you've got a limitless supply of RAM, and an index  
> that
> doesn't change very often, you can make your caches so big that no
> DocSet/DocList is ever generated dynamically more then once -- but  
> what
> happens when your index does finally change? ... if your autowarmCount
> is the same as the size of your index, Solr could spend days  
> autowarming
> every query ever executed against your index, even if it was only  
> executed
> one time 3 weeks ago.  the autowarmCount tells Solr to only warm the N
> "best" keys in the cache where "best" is defined by the Cache
> implimentation (for an LRUCache, the "best" things are the things most
> recently used).

"LRU" abbreviation confuses me.... I see "least recently used" when I  
see that, but it really means "last recently used" within Solr.  :)

> Once upon a time Yonik and I hypothisized that it would be cool to  
> have
> autowarmTimelimit and autowarmPercentage (of current size) params  
> and some
> other things like that so you could have other ways of tweaking  
> just how
> much autowarming is done on your behalf ... but they were never built.

No worries there.  The caching is quite nice as it is.  As need  
arises, more bells and whistles can be added, but the current  
parameters are sufficient for my needs so far.

	Erik


Re: autowarmCount usefulness

Posted by Chris Hostetter <ho...@fucit.org>.
: I'm trying to fully understand the LRUCache and the autowarmCount
: parameter.   Why does it make sense to auto-warm filters and query
: results?   In my case, if a new document is added it may invalidate
: many filters, and it would require knowing the details of the
: documents added/removed to know which caches could be copied.
:
: Can someone shed light on the scenarios where blindly copying over
: any cached filters (or query results) makes sense?

Autowarming of the filterCache and queryResultCache doesn't just copy the
cached values -- it reexecutes the queries used as the keys for those
caches and generates new DocSet/DocLists using the *new* searcher, before
that searcher is made available to threads serving queries over HTTP.

For named User caches, autowarming doesn't work at all unless you've
specified a regenerator -- which can do whatever it wants using the new
searcher and the information from the old cache.

The documentCache doesn't support autowarming at all (because the key is
doc id, and as you say: those change with every commit).


The reason autowarming is configured using an autowarmCount is so you can
control just how much effort Solr should put into the autowarming of the
new cache ... if you've got a limitless supply of RAM, and an index that
doesn't change very often, you can make your caches so big that no
DocSet/DocList is ever generated dynamically more then once -- but what
happens when your index does finally change? ... if your autowarmCount
is the same as the size of your index, Solr could spend days autowarming
every query ever executed against your index, even if it was only executed
one time 3 weeks ago.  the autowarmCount tells Solr to only warm the N
"best" keys in the cache where "best" is defined by the Cache
implimentation (for an LRUCache, the "best" things are the things most
recently used).


Once upon a time Yonik and I hypothisized that it would be cool to have
autowarmTimelimit and autowarmPercentage (of current size) params and some
other things like that so you could have other ways of tweaking just how
much autowarming is done on your behalf ... but they were never built.



-Hoss