You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Em <ma...@yahoo.de> on 2010/11/10 11:57:21 UTC

To cache or to not cache

Hi List,

in one of our application's use-case scenarios we create a response from
different data sources.
In clear words: We combine different responses from different data sources
(SQL, another Webservice and Solr) to one response.

We would cache this information per request for a couple of minutes or hours
outside of solr, since the data to cache does not come only from solr
itself.

However, I am not sure whether it would make sense to disable Solr's
internal cache-mechanisms or at last which cache-mechanisms I can disable,
because I am not sure what are the impacts of each cache in the long run.

A query is usually type of dismax and uses some functionQueries.
We do not sort, but we may use some filterQueries.

Furthermore we retrive just one of up to 10 (stored) fields from our index.
Most of the time it will be the same field (95-98% of the requests).

I think using the filterCache makes sense, but what about documentCache and
the others?
Since I retrive in 95-98% of all cases the same field from our stored
documents, how can I boost retriving that information?

Thank you!
-- 
View this message in context: http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1875289.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To cache or to not cache

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Nov 10, 2010 at 7:51 AM, Em <ma...@yahoo.de> wrote:

>
> Thank you Shalin.
> Yes, both - Solr and some other applications could possible run on the same
> box.
> I hoped that not storing redundantly in Solr and somewhere else in the RAM
> would not touch Solr's performance very much.
>
> Just to understand Solr'c caching mechanism:
>
> My first query is "red firefox" - all caches were turned on.
> If I am searching now for "red star", does this query makes any usage from
> the cache, since both share the term "red"?
>
>
Well, we can assume that some documents will be common so the documentCache
will be hit. If you are using a sort on fields or function queries, the
fieldCache built by lucene (not configurable) will be used. If there are any
common "fq" clauses, those will hit the filterCache. Apart from that, it is
difficult to say unless we know the field types and the parsed query.

-- 
Regards,
Shalin Shekhar Mangar.

Re: To cache or to not cache

Posted by Em <ma...@yahoo.de>.
Jonathan,

thanks for your statement. In fact, you are quite right: A lot of people
developed great caching mechanisms.
However, the solution I got in mind was something like an HTTP-Cache - in
most cases on the same box.

I talked to some experts who told me that Squid would be a relatively large
monster, since we only want him for http-caching.

Do you know any benchmarks about responses per second, if most of the
queried data is in the cache?

Regards
-- 
View this message in context: http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1881714.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To cache or to not cache

Posted by Jonathan Rochkind <ro...@jhu.edu>.
PS: There's also, I think, a way to turn on HTTP-level caching for Solr, 
which I believe is caching of entire responses that match an exact Solr 
query, filled without actually touching Solr at all. But I'm not sure 
about this, because I'm always trying to make sure this HTTP-level cache 
is turned off because it messes me up, rather than looking into the 
details of it.

In general, I doubt you are going to come up with any external caches 
that work better for Solr content than the caches in Solr itself, the 
product of hundreds of developer hours of work focused on Solr 
specifically.

Jonathan Rochkind wrote:
> You know, on further reflection, I'd suggest you think (and ideally 
> measure) hard about whether you even need this application-level 
> solr-data-cache.
>
> Solr is a caching machine, it's kind of what Solr does, one of the main 
> focuses of Solr. A query to Solr that hits the right caches comes back 
> amazingly fast.  With properly turned Solr caches for your use, and 
> sufficient RAM to hold them (possibly less than you think, Solr is 
> pretty efficient), I'm not sure you're going to get any benefit at all 
> from trying to write your own extra cache on top of Solr.
>
> Em wrote:
>   
>> Jonathan,
>>
>> sound like it makes sense.
>> In this case I think it is more important to size the external cache very
>> well, instead of Solr's.
>>
>> Even when 1/5th of the requests are redundant, an external cache could not
>> answer the other 4/5ths and so decreasing Solr's cache would slow down the
>> whole application.
>>
>> Since this is only a conceptual question, I really do not have got any
>> benchmark - data.
>> But if I have some, I will ask if it was possible to publish them.
>>
>> Regards
>>   
>>     
>
>   

Re: To cache or to not cache

Posted by Jonathan Rochkind <ro...@jhu.edu>.
You know, on further reflection, I'd suggest you think (and ideally 
measure) hard about whether you even need this application-level 
solr-data-cache.

Solr is a caching machine, it's kind of what Solr does, one of the main 
focuses of Solr. A query to Solr that hits the right caches comes back 
amazingly fast.  With properly turned Solr caches for your use, and 
sufficient RAM to hold them (possibly less than you think, Solr is 
pretty efficient), I'm not sure you're going to get any benefit at all 
from trying to write your own extra cache on top of Solr.

Em wrote:
> Jonathan,
>
> sound like it makes sense.
> In this case I think it is more important to size the external cache very
> well, instead of Solr's.
>
> Even when 1/5th of the requests are redundant, an external cache could not
> answer the other 4/5ths and so decreasing Solr's cache would slow down the
> whole application.
>
> Since this is only a conceptual question, I really do not have got any
> benchmark - data.
> But if I have some, I will ask if it was possible to publish them.
>
> Regards
>   

Re: To cache or to not cache

Posted by Em <ma...@yahoo.de>.
Jonathan,

sound like it makes sense.
In this case I think it is more important to size the external cache very
well, instead of Solr's.

Even when 1/5th of the requests are redundant, an external cache could not
answer the other 4/5ths and so decreasing Solr's cache would slow down the
whole application.

Since this is only a conceptual question, I really do not have got any
benchmark - data.
But if I have some, I will ask if it was possible to publish them.

Regards
-- 
View this message in context: http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1877245.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To cache or to not cache

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Em wrote:
> My first query is "red firefox" - all caches were turned on.
> If I am searching now for "red star", does this query makes any usage from
> the cache, since both share the term "red"?
>   
I don't believe it does, no.

I understand your question -- if your caching things externally anyway, 
do you need caches in Solr, or is that just redundant?

The answer is kind of complicated though -- maybe, maybe not.  In some 
cases having too small Solr caches will make your Solr performance 
really bad --  if you want to page through Solr results, for instance, 
the document cache is going to be important. In fact, if Solr can't hold 
enough for the _current page_ in the cache, that's going to mess up Solr 
even more, and even returning a single request, Solr functions that want 
to look at the documents are (in some cases) going to keep retreiving 
them over and over again, instead of getting them from the cache -- even 
within a single Solr request-response.

I could be wrong about some of those details, this is me kind of 
hand-waving because I'm not an expert at this stuff. I know just enough 
to try not to be dangerous (ha), meaning that I am pretty sure that you 
can't issue a blanket "yeah, get rid of Solr caches" in your 
circumstance.  There are probably some caches you can make (much) 
smaller , but it requires kind of complicated Solr-fu to understand 
which those are.

You could certainly keep your caches fairly small, and see what happens, 
do some benchmarking.

Jonathan



Re: To cache or to not cache

Posted by Em <ma...@yahoo.de>.
Thank you Shalin.
Yes, both - Solr and some other applications could possible run on the same
box.
I hoped that not storing redundantly in Solr and somewhere else in the RAM
would not touch Solr's performance very much. 

Just to understand Solr'c caching mechanism:

My first query is "red firefox" - all caches were turned on.
If I am searching now for "red star", does this query makes any usage from
the cache, since both share the term "red"?

Kind regards
-- 
View this message in context: http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1876767.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To cache or to not cache

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Nov 10, 2010 at 2:57 AM, Em <ma...@yahoo.de> wrote:

>
> Hi List,
>
> in one of our application's use-case scenarios we create a response from
> different data sources.
> In clear words: We combine different responses from different data sources
> (SQL, another Webservice and Solr) to one response.
>
> We would cache this information per request for a couple of minutes or
> hours
> outside of solr, since the data to cache does not come only from solr
> itself.
>
> However, I am not sure whether it would make sense to disable Solr's
> internal cache-mechanisms or at last which cache-mechanisms I can disable,
> because I am not sure what are the impacts of each cache in the long run.
>
>
In general, Solr caches are essential for performance and it only caches
those objects which were required by one or more of your queries. You can
only decrease Solr's performance by turning off its cache. If the reason
behind turning off Solr cache is that your custom cache and Solr are running
on the same box, then you must monitor performance while reducing the sizes
of the caches and then lower the Solr heap based on your peak memory load
with the reduced cache sizes.


> A query is usually type of dismax and uses some functionQueries.
> We do not sort, but we may use some filterQueries.
>
> Furthermore we retrive just one of up to 10 (stored) fields from our index.
> Most of the time it will be the same field (95-98% of the requests).
>
> I think using the filterCache makes sense, but what about documentCache and
> the others?
> Since I retrive in 95-98% of all cases the same field from our stored
> documents, how can I boost retriving that information?
>
>
The documentCache caches all the stored fields for a document and is not
tunable to cache particular fields only.

-- 
Regards,
Shalin Shekhar Mangar.