You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simon Fairey <si...@gmail.com> on 2014/10/06 17:24:59 UTC

Solr configuration, memory usage and MMapDirectory

Hi

I've inherited a Solr config and am doing some sanity checks before making
some updates, I'm concerned about the memory settings.

System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, each node
has 32 CPU cores and 132GB RAM, we index around 500k files a day spread out
over the day in batches every 10 minutes, a portion of these are updates to
existing content, maybe 5-10%. Currently MergeFactor is set to 2 and commit
settings are:

<autoCommit>

    <maxTime>60000</maxTime>

    <openSearcher>false</openSearcher>

</autoCommit>

<autoSoftCommit>

    <maxTime>900000</maxTime>

</autoSoftCommit>

Currently each node has around 25M docs in with an index size of 45GB, we
prune the data every few weeks so it never gets much above 35M docs per
node.

On reading I've seen a recommendation that we should be using MMapDirectory,
currently it's set to NRTCachingDirectoryFactory. However currently the JVM
is configured with -Xmx131072m, and for MMapDirectory I've read you should
use less memory for the JVM so there is more available for the OS caching.

Looking at the dashboard in the JVM memory usage I see:



Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is in use
at the moment and the light grey is allocated as it was used previously but
not been cleaned up yet?

I'm trying to understand if this will help me know how much would be a good
value to change Xmx to, i.e. say 64GB based on light grey?

Additionally once I've changed the max heap size is it a simple case of
changing the config to use MMapDirectory or are there things i need to watch
out for?

Thanks

Si

 


RE: Solr configuration, memory usage and MMapDirectory

Posted by Simon Fairey <si...@gmail.com>.
Thanks I will have a read and digest this.

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: 06 October 2014 16:56
To: solr-user@lucene.apache.org
Subject: Re: Solr configuration, memory usage and MMapDirectory

On 10/6/2014 9:24 AM, Simon Fairey wrote:
> I've inherited a Solr config and am doing some sanity checks before 
> making some updates, I'm concerned about the memory settings.
>
> System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, 
> each node has 32 CPU cores and 132GB RAM, we index around 500k files a 
> day spread out over the day in batches every 10 minutes, a portion of 
> these are updates to existing content, maybe 5-10%. Currently 
> MergeFactor is set to 2 and commit settings are:
>
> <autoCommit>
>
>     <maxTime>60000</maxTime>
>
>     <openSearcher>false</openSearcher>
>
> </autoCommit>
>
> <autoSoftCommit>
>
>     <maxTime>900000</maxTime>
>
> </autoSoftCommit>
>
> Currently each node has around 25M docs in with an index size of 45GB, 
> we prune the data every few weeks so it never gets much above 35M docs 
> per node.
>
> On reading I've seen a recommendation that we should be using 
> MMapDirectory, currently it's set to NRTCachingDirectoryFactory.
> However currently the JVM is configured with -Xmx131072m, and for 
> MMapDirectory I've read you should use less memory for the JVM so 
> there is more available for the OS caching.
>
> Looking at the dashboard in the JVM memory usage I see:
>
> enter image description here
>
> Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is 
> in use at the moment and the light grey is allocated as it was used 
> previously but not been cleaned up yet?
>
> I'm trying to understand if this will help me know how much would be a 
> good value to change Xmx to, i.e. say 64GB based on light grey?
>
> Additionally once I've changed the max heap size is it a simple case 
> of changing the config to use MMapDirectory or are there things i need 
> to watch out for?
>

NRTCachingDirectoryFactory is a wrapper directory implementation. The wrapped Directory implementation is used with some code between that implementation and the consumer (Solr in this case) that does caching for NRT indexing.  The wrapped implementation is MMapDirectory, so you do not need to switch, you ARE using MMap.

Attachments rarely make it to the list, and that has happened in this case, so I cannot see any of your pictures.  Instead, look at one of mine, and the output of a command from the same machine, running Solr
4.7.2 with Oracle Java 7:

https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0

[root@idxa1 ~]# du -sh /index/solr4/data/
64G     /index/solr4/data/

I've got 64GB of index data on this machine, used by about 56 million documents.  I've also got 64GB of RAM.  The solr process shows a virtual memory size of 54GB, a resident size of 16GB, and a shared size of 11GB.  My max heap on this process is 6GB.  If you deduct the shared memory size from the resident size, you get about 5GB.  The admin dashboard for this machine says the current max heap size is 5.75GB, so that 5GB is pretty close to that, and probably matches up really well when you consider that the resident size may be considerably more than 16GB and the shared size may be just barely over 11GB.

My system has well over 9GB free memory and 44GB is being used for the OS disk cache.  This system is NOT facing memory pressure.  The index is well-cached and there is even memory that is not used *at all*.

With an index size of 45GB and 132GB of RAM, you're unlikely to be having problems with memory unless your heap size is *ENORMOUS*.  You
*should* have your garbage collection highly tuned, especially if your max heap larger than 2 or 3GB.  I would guess that a 4 to 6GB heap is probably enough for your needs, unless you're doing a lot with facets, sorting, or Solr's caches, then you may need more.  Here's some info about heap requirements, followed by information about garbage collection tuning:

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Your automatic commit settings do not raise any red flags with me. 
Those are sensible settings.

Thanks,
Shawn



RE: Solr configuration, memory usage and MMapDirectory

Posted by Simon Fairey <si...@gmail.com>.
Hi

Thanks for this I will investigate further after reading a number of your points in more detail, I do have a feeling they've setup too many entries in the filter cache (1000s) so will revisit that.

Just a note on numbers, those were valid when I made the post but obviously they change as the week progresses before a regular clean-up of content, current numbers for info (if it's at all relevant) from the index admin view on one of the 2 nodes is:

Last Modified:	18 minutes ago
Num Docs:    	24590368
Max Doc:    	29139255
Deleted Docs: 	4548887
Version:  		1297982
Segment Count: 	28
	
 	           Version 	    Gen 	Size
Master: 	1412798583558 402364 52.98 GB

Top:
2996 tomcat6   20   0  189g  73g 1.5g S   15 58.7  58034:04 java

And the only GC option I can see that is on is "- XX:+UseConcMarkSweepGC"

Regarding the XY problem, you are very likely correct, unfortunately I wasn't involved in the config and I very much suspect when it was done many of the defaults were used and then if it didn't work or there was say an out of memory error they just upped the heap to solve the symptom without investigating the cause. The luxury of having more than enough RAM I guess!

I'm going to get some late night downtime soon at which point I'm hoping to change the heap size, GC settings and add the JMX, it's not exposed to the internet so no security is fine.

Right off to do some reading!

Cheers

Si

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: 08 October 2014 21:09
To: solr-user@lucene.apache.org
Subject: Re: Solr configuration, memory usage and MMapDirectory

On 10/8/2014 4:02 AM, Simon Fairey wrote:
> I'm currently setting up jconsole but as I have to remotely monitor (no gui capability on the server) I have to wait before I can restart solr with a JMX port setup. In the meantime I looked at top and given the calculations you said based on your top output and this top of my java process from the node that handles the querying, the indexing node has a similar memory profile:
> 
> https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0
> 
> It would seem I need a monstrously large heap in the 60GB region?
> 
> We do use a lot of navigators/filters so I have set the caches to be quite large for these, are these what are using up the memory?

With a VIRT size of 189GB and a RES size of 73GB, I believe you probably have more than 45GB of index data.  This might be a combination of old indexes and the active index.  Only the indexes (cores) that are being actively used need to be considered when trying to calculate the total RAM needed.  Other indexes will not affect performance, even though they increase your virtual memory size.

With MMap, part of the virtual memory size is the size of the index data that has been opened on the disk.  This is not memory that's actually allocated.  There's a very good reason that mmap has been the default in Lucene and Solr for more than two years.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

You stated originally that you have 25 million document and 45GB of index data on each node.  With those numbers and a conservative configuration, I would expect that you need about 4GB of heap, maybe as much as 8GB.  I cannot think of any reason that you would NEED a heap 60GB or larger.

Each field that you sort on, each field that you facet on with the default facet.method of fc, and each filter that you cache will use a large block of memory.  The size of that block of memory is almost exclusively determined by the number of documents in the index.

With 25 million documents, each filterCache entry will be approximately 3MB -- one bit for every document.  I do not know how big each FieldCache entry is for a sort field and a facet field, but assume that they are probably larger than the 3MB entries on the filterCache.

I've got a filterCache sized at 64, with an autowarmCount of 4.  With larger autowarmCount values, I was seeing commits take 30 seconds or more, because each of those filters can take a few seconds to execute.
Cache sizes in the thousands are rarely necessary, and just chew up a lot of memory with no benefit.  Large autowarmCount values are also rarely necessary.  Every time a new searcher is opened by a commit, add up all your autowarmCount values and realize that the searcher likely needs to execute that many queries before it is available.

If you need to set up remote JMX so you can remotely connect jconsole, I have done this in the redhat init script I've built -- see JMX_OPTS here:

http://wiki.apache.org/solr/ShawnHeisey#Init_script

It's never a good idea to expose Solr directly to the internet, but if you use that JMX config, *definitely* don't expose it to the Internet.
It doesn't use any authentication.

We might need to back up a little bit and start with the problem that you are trying to figure out, not the numbers that are being reported.

http://people.apache.org/~hossman/#xyproblem

Your original note said that you're sanity checking.  Toward that end, the only insane thing that jumps out at me is that your max heap is
*VERY* large, and you probably don't have the proper GC tuning.

My recommendations for initial action are to use -Xmx8g on the servlet container startup and include the GC settings you can find on the wiki pages I've given you.  It would be a very good idea to set up remote JMX so you can use jconsole or jvisualvm remotely.

Thanks,
Shawn



Re: Solr configuration, memory usage and MMapDirectory

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/8/2014 4:02 AM, Simon Fairey wrote:
> I'm currently setting up jconsole but as I have to remotely monitor (no gui capability on the server) I have to wait before I can restart solr with a JMX port setup. In the meantime I looked at top and given the calculations you said based on your top output and this top of my java process from the node that handles the querying, the indexing node has a similar memory profile:
> 
> https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0
> 
> It would seem I need a monstrously large heap in the 60GB region?
> 
> We do use a lot of navigators/filters so I have set the caches to be quite large for these, are these what are using up the memory?

With a VIRT size of 189GB and a RES size of 73GB, I believe you probably
have more than 45GB of index data.  This might be a combination of old
indexes and the active index.  Only the indexes (cores) that are being
actively used need to be considered when trying to calculate the total
RAM needed.  Other indexes will not affect performance, even though they
increase your virtual memory size.

With MMap, part of the virtual memory size is the size of the index data
that has been opened on the disk.  This is not memory that's actually
allocated.  There's a very good reason that mmap has been the default in
Lucene and Solr for more than two years.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

You stated originally that you have 25 million document and 45GB of
index data on each node.  With those numbers and a conservative
configuration, I would expect that you need about 4GB of heap, maybe as
much as 8GB.  I cannot think of any reason that you would NEED a heap
60GB or larger.

Each field that you sort on, each field that you facet on with the
default facet.method of fc, and each filter that you cache will use a
large block of memory.  The size of that block of memory is almost
exclusively determined by the number of documents in the index.

With 25 million documents, each filterCache entry will be approximately
3MB -- one bit for every document.  I do not know how big each
FieldCache entry is for a sort field and a facet field, but assume that
they are probably larger than the 3MB entries on the filterCache.

I've got a filterCache sized at 64, with an autowarmCount of 4.  With
larger autowarmCount values, I was seeing commits take 30 seconds or
more, because each of those filters can take a few seconds to execute.
Cache sizes in the thousands are rarely necessary, and just chew up a
lot of memory with no benefit.  Large autowarmCount values are also
rarely necessary.  Every time a new searcher is opened by a commit, add
up all your autowarmCount values and realize that the searcher likely
needs to execute that many queries before it is available.

If you need to set up remote JMX so you can remotely connect jconsole, I
have done this in the redhat init script I've built -- see JMX_OPTS here:

http://wiki.apache.org/solr/ShawnHeisey#Init_script

It's never a good idea to expose Solr directly to the internet, but if
you use that JMX config, *definitely* don't expose it to the Internet.
It doesn't use any authentication.

We might need to back up a little bit and start with the problem that
you are trying to figure out, not the numbers that are being reported.

http://people.apache.org/~hossman/#xyproblem

Your original note said that you're sanity checking.  Toward that end,
the only insane thing that jumps out at me is that your max heap is
*VERY* large, and you probably don't have the proper GC tuning.

My recommendations for initial action are to use -Xmx8g on the servlet
container startup and include the GC settings you can find on the wiki
pages I've given you.  It would be a very good idea to set up remote JMX
so you can use jconsole or jvisualvm remotely.

Thanks,
Shawn


RE: Solr configuration, memory usage and MMapDirectory

Posted by Simon Fairey <si...@gmail.com>.
Hi

I'm currently setting up jconsole but as I have to remotely monitor (no gui capability on the server) I have to wait before I can restart solr with a JMX port setup. In the meantime I looked at top and given the calculations you said based on your top output and this top of my java process from the node that handles the querying, the indexing node has a similar memory profile:

https://www.dropbox.com/s/pz85dm4e7qpepco/SolrTop.png?dl=0

It would seem I need a monstrously large heap in the 60GB region?

We do use a lot of navigators/filters so I have set the caches to be quite large for these, are these what are using up the memory?

Thanks

Si

-----Original Message-----
From: Shawn Heisey [mailto:apache@elyograg.org] 
Sent: 06 October 2014 16:56
To: solr-user@lucene.apache.org
Subject: Re: Solr configuration, memory usage and MMapDirectory

On 10/6/2014 9:24 AM, Simon Fairey wrote:
> I've inherited a Solr config and am doing some sanity checks before 
> making some updates, I'm concerned about the memory settings.
>
> System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, 
> each node has 32 CPU cores and 132GB RAM, we index around 500k files a 
> day spread out over the day in batches every 10 minutes, a portion of 
> these are updates to existing content, maybe 5-10%. Currently 
> MergeFactor is set to 2 and commit settings are:
>
> <autoCommit>
>
>     <maxTime>60000</maxTime>
>
>     <openSearcher>false</openSearcher>
>
> </autoCommit>
>
> <autoSoftCommit>
>
>     <maxTime>900000</maxTime>
>
> </autoSoftCommit>
>
> Currently each node has around 25M docs in with an index size of 45GB, 
> we prune the data every few weeks so it never gets much above 35M docs 
> per node.
>
> On reading I've seen a recommendation that we should be using 
> MMapDirectory, currently it's set to NRTCachingDirectoryFactory.
> However currently the JVM is configured with -Xmx131072m, and for 
> MMapDirectory I've read you should use less memory for the JVM so 
> there is more available for the OS caching.
>
> Looking at the dashboard in the JVM memory usage I see:
>
> enter image description here
>
> Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is 
> in use at the moment and the light grey is allocated as it was used 
> previously but not been cleaned up yet?
>
> I'm trying to understand if this will help me know how much would be a 
> good value to change Xmx to, i.e. say 64GB based on light grey?
>
> Additionally once I've changed the max heap size is it a simple case 
> of changing the config to use MMapDirectory or are there things i need 
> to watch out for?
>

NRTCachingDirectoryFactory is a wrapper directory implementation. The wrapped Directory implementation is used with some code between that implementation and the consumer (Solr in this case) that does caching for NRT indexing.  The wrapped implementation is MMapDirectory, so you do not need to switch, you ARE using MMap.

Attachments rarely make it to the list, and that has happened in this case, so I cannot see any of your pictures.  Instead, look at one of mine, and the output of a command from the same machine, running Solr
4.7.2 with Oracle Java 7:

https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0

[root@idxa1 ~]# du -sh /index/solr4/data/
64G     /index/solr4/data/

I've got 64GB of index data on this machine, used by about 56 million documents.  I've also got 64GB of RAM.  The solr process shows a virtual memory size of 54GB, a resident size of 16GB, and a shared size of 11GB.  My max heap on this process is 6GB.  If you deduct the shared memory size from the resident size, you get about 5GB.  The admin dashboard for this machine says the current max heap size is 5.75GB, so that 5GB is pretty close to that, and probably matches up really well when you consider that the resident size may be considerably more than 16GB and the shared size may be just barely over 11GB.

My system has well over 9GB free memory and 44GB is being used for the OS disk cache.  This system is NOT facing memory pressure.  The index is well-cached and there is even memory that is not used *at all*.

With an index size of 45GB and 132GB of RAM, you're unlikely to be having problems with memory unless your heap size is *ENORMOUS*.  You
*should* have your garbage collection highly tuned, especially if your max heap larger than 2 or 3GB.  I would guess that a 4 to 6GB heap is probably enough for your needs, unless you're doing a lot with facets, sorting, or Solr's caches, then you may need more.  Here's some info about heap requirements, followed by information about garbage collection tuning:

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Your automatic commit settings do not raise any red flags with me. 
Those are sensible settings.

Thanks,
Shawn



Re: Solr configuration, memory usage and MMapDirectory

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/6/2014 9:24 AM, Simon Fairey wrote:
> I've inherited a Solr config and am doing some sanity checks before
> making some updates, I'm concerned about the memory settings.
>
> System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes,
> each node has 32 CPU cores and 132GB RAM, we index around 500k files a
> day spread out over the day in batches every 10 minutes, a portion of
> these are updates to existing content, maybe 5-10%. Currently
> MergeFactor is set to 2 and commit settings are:
>
> <autoCommit>
>
>     <maxTime>60000</maxTime>
>
>     <openSearcher>false</openSearcher>
>
> </autoCommit>
>
> <autoSoftCommit>
>
>     <maxTime>900000</maxTime>
>
> </autoSoftCommit>
>
> Currently each node has around 25M docs in with an index size of 45GB,
> we prune the data every few weeks so it never gets much above 35M docs
> per node.
>
> On reading I've seen a recommendation that we should be using
> MMapDirectory, currently it's set to NRTCachingDirectoryFactory.
> However currently the JVM is configured with -Xmx131072m, and for
> MMapDirectory I've read you should use less memory for the JVM so
> there is more available for the OS caching.
>
> Looking at the dashboard in the JVM memory usage I see:
>
> enter image description here
>
> Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is
> in use at the moment and the light grey is allocated as it was used
> previously but not been cleaned up yet?
>
> I'm trying to understand if this will help me know how much would be a
> good value to change Xmx to, i.e. say 64GB based on light grey?
>
> Additionally once I've changed the max heap size is it a simple case
> of changing the config to use MMapDirectory or are there things i need
> to watch out for?
>

NRTCachingDirectoryFactory is a wrapper directory implementation. The
wrapped Directory implementation is used with some code between that
implementation and the consumer (Solr in this case) that does caching
for NRT indexing.  The wrapped implementation is MMapDirectory, so you
do not need to switch, you ARE using MMap.

Attachments rarely make it to the list, and that has happened in this
case, so I cannot see any of your pictures.  Instead, look at one of
mine, and the output of a command from the same machine, running Solr
4.7.2 with Oracle Java 7:

https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0

[root@idxa1 ~]# du -sh /index/solr4/data/
64G     /index/solr4/data/

I've got 64GB of index data on this machine, used by about 56 million
documents.  I've also got 64GB of RAM.  The solr process shows a virtual
memory size of 54GB, a resident size of 16GB, and a shared size of
11GB.  My max heap on this process is 6GB.  If you deduct the shared
memory size from the resident size, you get about 5GB.  The admin
dashboard for this machine says the current max heap size is 5.75GB, so
that 5GB is pretty close to that, and probably matches up really well
when you consider that the resident size may be considerably more than
16GB and the shared size may be just barely over 11GB.

My system has well over 9GB free memory and 44GB is being used for the
OS disk cache.  This system is NOT facing memory pressure.  The index is
well-cached and there is even memory that is not used *at all*.

With an index size of 45GB and 132GB of RAM, you're unlikely to be
having problems with memory unless your heap size is *ENORMOUS*.  You
*should* have your garbage collection highly tuned, especially if your
max heap larger than 2 or 3GB.  I would guess that a 4 to 6GB heap is
probably enough for your needs, unless you're doing a lot with facets,
sorting, or Solr's caches, then you may need more.  Here's some info
about heap requirements, followed by information about garbage
collection tuning:

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Your automatic commit settings do not raise any red flags with me. 
Those are sensible settings.

Thanks,
Shawn


Re: Solr configuration, memory usage and MMapDirectory

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/6/2014 10:07 AM, Simon Fairey wrote:
> Thanks and yeah I thought it might be crazy, the image is just the JVM memory usage you get from the dashboard on the solr admin pages, the JVM on has what appears to be a light grey then dark grey band then some blank space, those are the numbers I referred to if that makes sense?
>
> Bit of quick ascii art to represent JVM memory usage image:
>
> ##########====----------------
>
> As I look now:
> -	Ends at 127.81GB
> = 	Ends at 67.30GB
> # 	Ends at 39.24GB
>
> My guess was that the light grey is used memory that hasn't been garbage collected and the total bar is equivalent to the max heap setting?

If that's your JVM memory graph, then your max heap is *WAY* too big --
it's 128GB, and 67GB of that has actually been allocated.  Even with a
highly tuned GC, you're likely to have occasional long GC pauses ... and
if you haven't tuned your GC, those pauses could last for a minute or more.

As I said in my earlier reply, 4 to 6GB is probably enough for your max
heap.  The page that I linked on that reply describes using jconsole or
jvisualvm to connect to your running Solr instance and watch the memory
utilization graphs, to see just how much memory is required for the max
heap.

Thanks,
Shawn


RE: Solr configuration, memory usage and MMapDirectory

Posted by Simon Fairey <si...@gmail.com>.
Thanks and yeah I thought it might be crazy, the image is just the JVM memory usage you get from the dashboard on the solr admin pages, the JVM on has what appears to be a light grey then dark grey band then some blank space, those are the numbers I referred to if that makes sense?

Bit of quick ascii art to represent JVM memory usage image:

##########====----------------

As I look now:
-	Ends at 127.81GB
= 	Ends at 67.30GB
# 	Ends at 39.24GB

My guess was that the light grey is used memory that hasn't been garbage collected and the total bar is equivalent to the max heap setting?

Will digest the blog.

Cheers

Si

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 06 October 2014 16:58
To: solr-user@lucene.apache.org
Subject: Re: Solr configuration, memory usage and MMapDirectory

First, the e-mail programs tend to strip attachments so your screenshot didn't come through. You can past it up somewhere and provide a link if you still need us to see it.

That said....

-Xmx131072m

This is insane, you're absolutely right to focus on that first. Here's Uwe's excellent blog ont he subject, with hints on how to read top:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Meanwhile, Shawn gave you some very good info so I won't repeat any....

On Mon, Oct 6, 2014 at 8:24 AM, Simon Fairey <si...@gmail.com> wrote:

> Hi
>
> I've inherited a Solr config and am doing some sanity checks before 
> making some updates, I'm concerned about the memory settings.
>
> System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, 
> each node has 32 CPU cores and 132GB RAM, we index around 500k files a 
> day spread out over the day in batches every 10 minutes, a portion of 
> these are updates to existing content, maybe 5-10%. Currently 
> MergeFactor is set to 2 and commit settings are:
>
> <autoCommit>
>
>     <maxTime>60000</maxTime>
>
>     <openSearcher>false</openSearcher>
>
> </autoCommit>
>
> <autoSoftCommit>
>
>     <maxTime>900000</maxTime>
>
> </autoSoftCommit>
>
> Currently each node has around 25M docs in with an index size of 45GB, 
> we prune the data every few weeks so it never gets much above 35M docs 
> per node.
>
> On reading I've seen a recommendation that we should be using 
> MMapDirectory, currently it's set to NRTCachingDirectoryFactory. 
> However currently the JVM is configured with -Xmx131072m, and for 
> MMapDirectory I've read you should use less memory for the JVM so 
> there is more available for the OS caching.
>
> Looking at the dashboard in the JVM memory usage I see:
>
> [image: enter image description here]
>
> Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is 
> in use at the moment and the light grey is allocated as it was used 
> previously but not been cleaned up yet?
>
> I'm trying to understand if this will help me know how much would be a 
> good value to change Xmx to, i.e. say 64GB based on light grey?
>
> Additionally once I've changed the max heap size is it a simple case 
> of changing the config to use MMapDirectory or are there things i need 
> to watch out for?
>
> Thanks
>
> Si
>
>
>


Re: Solr configuration, memory usage and MMapDirectory

Posted by Erick Erickson <er...@gmail.com>.
First, the e-mail programs tend to strip attachments so your screenshot
didn't come through. You can past it up somewhere and provide a link if you
still need us to see it.

That said....

-Xmx131072m

This is insane, you're absolutely right to focus on that first. Here's
Uwe's excellent blog ont he subject, with hints on how to read top:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Meanwhile, Shawn gave you some very good info so I won't repeat any....

On Mon, Oct 6, 2014 at 8:24 AM, Simon Fairey <si...@gmail.com> wrote:

> Hi
>
> I've inherited a Solr config and am doing some sanity checks before making
> some updates, I'm concerned about the memory settings.
>
> System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, each
> node has 32 CPU cores and 132GB RAM, we index around 500k files a day
> spread out over the day in batches every 10 minutes, a portion of these are
> updates to existing content, maybe 5-10%. Currently MergeFactor is set to 2
> and commit settings are:
>
> <autoCommit>
>
>     <maxTime>60000</maxTime>
>
>     <openSearcher>false</openSearcher>
>
> </autoCommit>
>
> <autoSoftCommit>
>
>     <maxTime>900000</maxTime>
>
> </autoSoftCommit>
>
> Currently each node has around 25M docs in with an index size of 45GB, we
> prune the data every few weeks so it never gets much above 35M docs per
> node.
>
> On reading I've seen a recommendation that we should be using
> MMapDirectory, currently it's set to NRTCachingDirectoryFactory. However
> currently the JVM is configured with -Xmx131072m, and for MMapDirectory
> I've read you should use less memory for the JVM so there is more available
> for the OS caching.
>
> Looking at the dashboard in the JVM memory usage I see:
>
> [image: enter image description here]
>
> Not sure I understand the 3 bands, assume 127.81 is Max, dark grey is in
> use at the moment and the light grey is allocated as it was used previously
> but not been cleaned up yet?
>
> I'm trying to understand if this will help me know how much would be a
> good value to change Xmx to, i.e. say 64GB based on light grey?
>
> Additionally once I've changed the max heap size is it a simple case of
> changing the config to use MMapDirectory or are there things i need to
> watch out for?
>
> Thanks
>
> Si
>
>
>