You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bastien Latard - MDPI AG <la...@mdpi.com.INVALID> on 2016/04/11 15:39:14 UTC

Cache problem

Dear Solr experts :),

I read this very interesting post 'Understanding and tuning your Solr 
caches <https://teaspoon-consulting.com/articles/solr-cache-tuning.html>' !
This is the only good document that I was able to find after searching 
for 1 day!

/I was using Solr for 2 years without knowing in details what it was 
caching...(because I did not need to understand it before).//
//I had to take a look since I needed to restart (regularly) my tomcat 
in order to improve performances.../

But I now have 2 questions:
1) *How can I know how much RAM is my solr using* *in real* (especially 
for caching)?
2) Could you have a quick look into the following images and tell me if 
I'm doing something wrong?

Note: my index contains 66 millions of articles with several text fields 
stored.


/My solr contains several cores (all together are ~80Gb big), but almost 
only the one below is used./

I have the feeling that a lot of data is always stored in RAM...and 
getting bigger and bigger all the time...




(after restart)
/$ sudo tail -f /var/log/tomcat7/catalina.out | grep GC/

[...] after a few minutes


Here are some images, that can show you some stats about my Solr 
performances...






Kind regards,
Bastien Latard



Re: Cache problem

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/13/2016 4:34 AM, Bastien Latard - MDPI AG wrote:
> Thank you all again for your good and detailed answer.
> I will combine all of them to try to build a better environment.
>
> *Just a last question...*
> /I don't remember exactly when I needed to increase the java heap.../
> /but is it possible that this was for the DataImport.../
>
> *Would the DIH work if it cannot "load" the temporary index into the
> java heap in the full-index mode?*
> I thought that's why I needed to increase this value...but I might be
> confused!

The default behavior on many JDBC drivers is to load the *entire* SQL
result into memory *before* sending those results to the requesting
application.  This is the way the MySQL driver behaves by default, and
the way that older versions of the Microsoft driver for SQL Server
behave by default.

There should be a way to tell the JDBC driver to stream the results back
instead of loading them into memory.  For MySQL, you just have to set
the batchSize parameter in the DIH config to -1, which causes the
underlying code to do "setFetchSize(Integer.MIN_VALUE)".  For SQL
Server, you need a recent version of the driver, where they changed the
default behavior.   For other databases, you may need a JDBC url parameter.

Thanks,
Shawn


Re: Cache problem

Posted by Bastien Latard - MDPI AG <la...@mdpi.com.INVALID>.
Thank you all again for your good and detailed answer.
I will combine all of them to try to build a better environment.

*Just a last question...*
/I don't remember exactly when I needed to increase the java heap.../
/but is it possible that this was for the DataImport.../

*Would the DIH work if it cannot "load" the temporary index into the 
java heap in the full-index mode?*
I thought that's why I needed to increase this value...but I might be 
confused!

kind regards,
Bastien

On 13/04/2016 09:54, Shawn Heisey wrote:
>> >Question #1:
>> > From the picture above, we see Physical memory: ~60Gb
>> >*  -> is this because of -Xmx40960m AND -XX:MaxPermSize=20480m ? *
> I don't actually know whether permgen is allocated from the heap, or *in
> addition* to the heap.  Your current allocated heap size is 20GB, which
> means that at most Java is taking up 30GB, but it might be just 20GB.
> The other 30-40GB is used by the operating system -- for disk caching
> (the page cache).  It's perfectly normal for physical memory to be
> almost completely maxed out.  The physical memory graph is nearly
> useless for troubleshooting.

Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
latard@mdpi.com
http://www.mdpi.com/


Re: Cache problem

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/13/2016 12:57 AM, Bastien Latard - MDPI AG wrote:
> Thank you Shawn & Reth!
>
> So I have now some questions, again
>
>
> Remind: I have only Solr running on this server (i.e.: java + tomcat).
>
> /BTW: I needed to increase previously the java heap size because I
> went out of memory. Actually, you only see here 2Gb (8Gb previously)
> for JVM because I automatically restart tomcat for a better
> performance every 30 minutes if no DIH running./

If you size your heap appropriately and properly tune garbage
collection, restarting like this should be unnecessary.

> Question #1:
> From the picture above, we see Physical memory: ~60Gb
> *  -> is this because of -Xmx40960m AND -XX:MaxPermSize=20480m ? *

I don't actually know whether permgen is allocated from the heap, or *in
addition* to the heap.  Your current allocated heap size is 20GB, which
means that at most Java is taking up 30GB, but it might be just 20GB. 
The other 30-40GB is used by the operating system -- for disk caching
(the page cache).  It's perfectly normal for physical memory to be
almost completely maxed out.  The physical memory graph is nearly
useless for troubleshooting.

Here's a screenshot of one of my servers:

https://www.dropbox.com/s/55d4x33tpyyaoff/solr-dashboard-physical-mem.png?dl=0

Notice that the max heap here is 8GB ... yet physical memory has 59GB
allocated -- 95 percent.  There are some additional java processes
taking up a few GB, but the vast majority of the memory is used by the
OS page cache.

> Question #2:
> /"The OS caches the actual index files"./
>
> *Does this mean that OS will try to cache 47.48Gb for this index? (if
> not, how can I know the size of the cache)
> */Or are you speaking about page cache
> <https://en.wikipedia.org/wiki/Page_cache>?/

I am talking about the page cache, also known as the disk cache.  The OS
will potentially use *all* unassigned memory for the page cache.  You
can ask your operating system how much memory is being used for this
purpose.

> Question #3:
> /"documentCache does live in Java heap"
> /*Is there a way to know the real size used/needed by this caching?*

Solr does not report memory usage with that much detail.  Perhaps one
day it will, but we're not there yet.  The size of an entry in the
documentCache should be approximately the size of the stored data for
that document, plus Java overhead required to hold the data.  The
filterCache is the one that usually uses a large amount of memory.

Thanks,
Shawn


Re: Cache problem

Posted by Bastien Latard - MDPI AG <la...@mdpi.com.INVALID>.
Thank you Shawn & Reth!

So I have now some questions, again


Remind: I have only Solr running on this server (i.e.: java + tomcat).

/BTW: I needed to increase previously the java heap size because I went 
out of memory. Actually, you only see here 2Gb (8Gb previously) for JVM 
because I automatically restart tomcat for a better performance every 30 
minutes if no DIH running.//
/
Question #1:
 From the picture above, we see Physical memory: ~60Gb
*  -> is this because of -Xmx40960m AND -XX:MaxPermSize=20480m ? *

Question #2:
/"The OS caches the actual index files"./

*Does this mean that OS will try to cache 47.48Gb for this index? (if 
not, how can I know the size of the cache)
*/Or are you speaking about page cache 
<https://en.wikipedia.org/wiki/Page_cache>?/*
*
Question #3:
/"documentCache does live in Java heap"
/*Is there a way to know the real size used/needed by this caching?*

Thanks for your help.

Kind regards,
Bastien

On 13/04/2016 02:47, Shawn Heisey wrote:
> On 4/12/2016 3:35 AM, Bastien Latard - MDPI AG wrote:
>> Thank you both, Bill and Reth!
>>
>> Here is my current options from my command to launch java:
>> */usr/bin/java  -Xms20480m -Xmx40960m -XX:PermSize=10240m
>> -XX:MaxPermSize=20480m [...]*
>>
>> So should I do *-Xms20480m -Xmx20480m*?
>> Why? What would it change?
> You do *NOT* need a 10GB permsize.  That's a definite waste of memory --
> most of it will never get used.  It's probably best to let Java handle
> the permgen.  This generation is entirely eliminated in Java 8.  In Java
> 7, the permsize usually doesn't need adjusting ... but if it does, Solr
> probably wouldn't even start without an adjustment.
>
> Regarding something said in another reply on this thread:  The
> documentCache *does* live in the Java heap, not the OS memory.  The OS
> caches the actual index files, and documentCache is maintained by Solr
> itself, separately from that.
>
> It is highly unlikely that you will ever need a 40GB heap.  You might
> not even need a 20GB heap.  As I said earlier:  Based on what I saw in
> your screenshots, I think you can run with an 8g heap (-Xms8g -Xmx8g),
> but you might need to try 12g instead.
>
> Thanks,
> Shawn
>
>

Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
latard@mdpi.com
http://www.mdpi.com/


Re: Cache problem

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/12/2016 3:35 AM, Bastien Latard - MDPI AG wrote:
> Thank you both, Bill and Reth!
>
> Here is my current options from my command to launch java:
> */usr/bin/java  -Xms20480m -Xmx40960m -XX:PermSize=10240m
> -XX:MaxPermSize=20480m [...]*
>
> So should I do *-Xms20480m -Xmx20480m*?
> Why? What would it change?

You do *NOT* need a 10GB permsize.  That's a definite waste of memory --
most of it will never get used.  It's probably best to let Java handle
the permgen.  This generation is entirely eliminated in Java 8.  In Java
7, the permsize usually doesn't need adjusting ... but if it does, Solr
probably wouldn't even start without an adjustment.

Regarding something said in another reply on this thread:  The
documentCache *does* live in the Java heap, not the OS memory.  The OS
caches the actual index files, and documentCache is maintained by Solr
itself, separately from that.

It is highly unlikely that you will ever need a 40GB heap.  You might
not even need a 20GB heap.  As I said earlier:  Based on what I saw in
your screenshots, I think you can run with an 8g heap (-Xms8g -Xmx8g),
but you might need to try 12g instead.

Thanks,
Shawn


Re: Cache problem

Posted by Reth RM <re...@gmail.com>.
This has answers about why giving enough memory to OS is important:
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
And as per solr admin dashboard, the os cache (physical memory is almost
utilized where as memory allocated to jvm is not used) so its best to lower
jvm memory.
Why set xms=xmx? this link pretty much answers it:
http://stackoverflow.com/questions/16087153/what-happens-when-we-set-xmx-and-xms-equal-size



On Tue, Apr 12, 2016 at 3:05 PM, Bastien Latard - MDPI AG <
latard@mdpi.com.invalid> wrote:

> Thank you both, Bill and Reth!
>
> Here is my current options from my command to launch java:
> */usr/bin/java  -Xms20480m -Xmx40960m -XX:PermSize=10240m
> -XX:MaxPermSize=20480m [...]*
>
> So should I do *-Xms20480m -Xmx20480m* ?
> Why? What would it change?
>
> Reminder: the size of my main index is 46Gb... (80Gb all together)
>
>
>
> BTW: what's the difference between dark and light grey in the JVM
> representation? (real/virtual memory?)
>
>
> NOTE: I have only tomcat running on this server (and this is my live
> website - *i.e.: quite critical*).
>
> So if document cache is using the OS cache, this might be the problem,
> right?
> (because it seems to cache every field ==> so all the data returned by the
> query)
>
> kr,
> Bast
>
>
> On 12/04/2016 08:19, Reth RM wrote:
>
> As per solr admin dashboard's memory report, solr jvm is not using memory
> more than 20 gb, where as physical memory is almost full.  I'd set
> xms=xmx=16 gb and let operating system use rest. And regarding caches:
>  filter cache hit ratio looks good so it should not be concern. And afaik,
> document cache actually uses OS cache. Overall, I'd reduce memory allocated
> to jvm as said above and try.
>
>
>
>
> On Mon, Apr 11, 2016 at 7:40 PM, <bi...@gmail.com> <bi...@gmail.com> wrote:
>
>
> You do need to optimize to get rid of the deleted docs probably...
>
> That is a lot of deleted docs
>
> Bill Bell
> Sent from mobile
>
>
>
> On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG
>
> <la...@mdpi.com.INVALID> <la...@mdpi.com.INVALID> wrote:
>
> Dear Solr experts :),
>
> I read this very interesting post 'Understanding and tuning your Solr
>
> caches' !
>
> This is the only good document that I was able to find after searching
>
> for 1 day!
>
> I was using Solr for 2 years without knowing in details what it was
>
> caching...(because I did not need to understand it before).
>
> I had to take a look since I needed to restart (regularly) my tomcat in
>
> order to improve performances...
>
> But I now have 2 questions:
> 1) How can I know how much RAM is my solr using in real (especially for
>
> caching)?
>
> 2) Could you have a quick look into the following images and tell me if
>
> I'm doing something wrong?
>
> Note: my index contains 66 millions of articles with several text fields
>
> stored.
>
> <mime-attachment.png>
>
> My solr contains several cores (all together are ~80Gb big), but almost
>
> only the one below is used.
>
> I have the feeling that a lot of data is always stored in RAM...and
>
> getting bigger and bigger all the time...
>
> <mime-attachment.png>
> <mime-attachment.png>
>
> (after restart)
> $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
> <mime-attachment.png>
> [...] after a few minutes
> <mime-attachment.png>
>
> Here are some images, that can show you some stats about my Solr
>
> performances...
>
> <mime-attachment.png>
> <mime-attachment.png>
> <mime-attachment.png>
>
> <mime-attachment.png>
>
> Kind regards,
> Bastien Latard
>
>
>
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail: latard@mdpi.comhttp://www.mdpi.com/
>
>

Re: Cache problem

Posted by Bastien Latard - MDPI AG <la...@mdpi.com.INVALID>.
Thank you both, Bill and Reth!

Here is my current options from my command to launch java:
*/usr/bin/java  -Xms20480m -Xmx40960m -XX:PermSize=10240m 
-XX:MaxPermSize=20480m [...]*

So should I do *-Xms20480m -Xmx20480m* ?
Why? What would it change?

Reminder: the size of my main index is 46Gb... (80Gb all together)



BTW: what's the difference between dark and light grey in the JVM 
representation? (real/virtual memory?)


NOTE: I have only tomcat running on this server (and this is my live 
website - /i.e.: quite critical/).

So if document cache is using the OS cache, this might be the problem, 
right?
(because it seems to cache every field ==> so all the data returned by 
the query)

kr,
Bast

On 12/04/2016 08:19, Reth RM wrote:
> As per solr admin dashboard's memory report, solr jvm is not using memory
> more than 20 gb, where as physical memory is almost full.  I'd set
> xms=xmx=16 gb and let operating system use rest. And regarding caches:
>   filter cache hit ratio looks good so it should not be concern. And afaik,
> document cache actually uses OS cache. Overall, I'd reduce memory allocated
> to jvm as said above and try.
>
>
>
>
> On Mon, Apr 11, 2016 at 7:40 PM, <bi...@gmail.com> wrote:
>
>> You do need to optimize to get rid of the deleted docs probably...
>>
>> That is a lot of deleted docs
>>
>> Bill Bell
>> Sent from mobile
>>
>>
>>> On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG
>> <la...@mdpi.com.INVALID> wrote:
>>> Dear Solr experts :),
>>>
>>> I read this very interesting post 'Understanding and tuning your Solr
>> caches' !
>>> This is the only good document that I was able to find after searching
>> for 1 day!
>>> I was using Solr for 2 years without knowing in details what it was
>> caching...(because I did not need to understand it before).
>>> I had to take a look since I needed to restart (regularly) my tomcat in
>> order to improve performances...
>>> But I now have 2 questions:
>>> 1) How can I know how much RAM is my solr using in real (especially for
>> caching)?
>>> 2) Could you have a quick look into the following images and tell me if
>> I'm doing something wrong?
>>> Note: my index contains 66 millions of articles with several text fields
>> stored.
>>> <mime-attachment.png>
>>>
>>> My solr contains several cores (all together are ~80Gb big), but almost
>> only the one below is used.
>>> I have the feeling that a lot of data is always stored in RAM...and
>> getting bigger and bigger all the time...
>>> <mime-attachment.png>
>>> <mime-attachment.png>
>>>
>>> (after restart)
>>> $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
>>> <mime-attachment.png>
>>> [...] after a few minutes
>>> <mime-attachment.png>
>>>
>>> Here are some images, that can show you some stats about my Solr
>> performances...
>>> <mime-attachment.png>
>>> <mime-attachment.png>
>>> <mime-attachment.png>
>>>
>>> <mime-attachment.png>
>>>
>>> Kind regards,
>>> Bastien Latard
>>>
>>>

Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
latard@mdpi.com
http://www.mdpi.com/


Re: Cache problem

Posted by Reth RM <re...@gmail.com>.
As per solr admin dashboard's memory report, solr jvm is not using memory
more than 20 gb, where as physical memory is almost full.  I'd set
xms=xmx=16 gb and let operating system use rest. And regarding caches:
 filter cache hit ratio looks good so it should not be concern. And afaik,
document cache actually uses OS cache. Overall, I'd reduce memory allocated
to jvm as said above and try.




On Mon, Apr 11, 2016 at 7:40 PM, <bi...@gmail.com> wrote:

> You do need to optimize to get rid of the deleted docs probably...
>
> That is a lot of deleted docs
>
> Bill Bell
> Sent from mobile
>
>
> > On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG
> <la...@mdpi.com.INVALID> wrote:
> >
> > Dear Solr experts :),
> >
> > I read this very interesting post 'Understanding and tuning your Solr
> caches' !
> > This is the only good document that I was able to find after searching
> for 1 day!
> >
> > I was using Solr for 2 years without knowing in details what it was
> caching...(because I did not need to understand it before).
> > I had to take a look since I needed to restart (regularly) my tomcat in
> order to improve performances...
> >
> > But I now have 2 questions:
> > 1) How can I know how much RAM is my solr using in real (especially for
> caching)?
> > 2) Could you have a quick look into the following images and tell me if
> I'm doing something wrong?
> >
> > Note: my index contains 66 millions of articles with several text fields
> stored.
> > <mime-attachment.png>
> >
> > My solr contains several cores (all together are ~80Gb big), but almost
> only the one below is used.
> >
> > I have the feeling that a lot of data is always stored in RAM...and
> getting bigger and bigger all the time...
> >
> > <mime-attachment.png>
> > <mime-attachment.png>
> >
> > (after restart)
> > $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
> > <mime-attachment.png>
> > [...] after a few minutes
> > <mime-attachment.png>
> >
> > Here are some images, that can show you some stats about my Solr
> performances...
> > <mime-attachment.png>
> > <mime-attachment.png>
> > <mime-attachment.png>
> >
> > <mime-attachment.png>
> >
> > Kind regards,
> > Bastien Latard
> >
> >
>

Re: Cache problem

Posted by bi...@gmail.com.
You do need to optimize to get rid of the deleted docs probably...

That is a lot of deleted docs

Bill Bell
Sent from mobile


> On Apr 11, 2016, at 7:39 AM, Bastien Latard - MDPI AG <la...@mdpi.com.INVALID> wrote:
> 
> Dear Solr experts :),
> 
> I read this very interesting post 'Understanding and tuning your Solr caches' !
> This is the only good document that I was able to find after searching for 1 day!
> 
> I was using Solr for 2 years without knowing in details what it was caching...(because I did not need to understand it before).
> I had to take a look since I needed to restart (regularly) my tomcat in order to improve performances...
> 
> But I now have 2 questions: 
> 1) How can I know how much RAM is my solr using in real (especially for caching)?
> 2) Could you have a quick look into the following images and tell me if I'm doing something wrong?
> 
> Note: my index contains 66 millions of articles with several text fields stored.
> <mime-attachment.png>
> 
> My solr contains several cores (all together are ~80Gb big), but almost only the one below is used.
> 
> I have the feeling that a lot of data is always stored in RAM...and getting bigger and bigger all the time...
> 
> <mime-attachment.png>
> <mime-attachment.png>
> 
> (after restart)
> $ sudo tail -f /var/log/tomcat7/catalina.out | grep GC
> <mime-attachment.png>
> [...] after a few minutes
> <mime-attachment.png>
> 
> Here are some images, that can show you some stats about my Solr performances...
> <mime-attachment.png>
> <mime-attachment.png>
> <mime-attachment.png>
> 
> <mime-attachment.png>
> 
> Kind regards,
> Bastien Latard
> 
>