You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Darrell Burgan <Da...@infor.com> on 2014/03/24 18:29:57 UTC

Solr 4.3.1 memory swapping

Hello all, we have a SolrCloud implementation in production, with two servers running Solr 4.3.1 in a SolrCloud configuration. Our search index is about 70-80GB in size.  The trouble is that after several days of uptime, we will suddenly have periods where the operating system Solr is running in starts swapping heavily. This gets progressively worse until the swapping slows things down so much that Zookeeper thinks the nodes are no longer available. If both nodes are swapping, it can lead to an outage, which has happened to us a couple of times.

My question is why is it swapping?  Here's an example with numbers from our prod environment:


-          Total physical memory: 16GB

-          Physical memory usage: 15.58GB (99.4%)

-          Total swap space: 4GB

-          Swap space usage: 1.51GB (37.7%)

-          Total JVM Memory: 10GB

-          JVM heap: 1.89GB/4.44GB

The "top" command reports that the JVM has 3.8GB resident RAM and 81.8GB virtual.  Note that it is using up close to half of the swap space, even though the JVM only needs a subset of the physical memory.

So what is causing the swapping, and what should I do about it? I can add more memory to the VMs if I need to, but how much? And how much should I allocate to JVM v. leave available for the OS?

I could attach a screen shot of our Solr console and the top output if the listserv allows attachments.

Any ideas?

Thanks!
Darrell Burgan

[Description: Infor]<http://www.infor.com/>

Darrell Burgan | Chief Architect, PeopleAnswers
office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | darrell.burgan@infor.com<ma...@infor.com> | http://www.infor.com

CONFIDENTIALITY NOTE: This email (including any attachments) is confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the information contained herein is prohibited.  If you have received this message in error, please notify the sender by replying to this message and then delete this message in its entirety. Thank you for your cooperation.


RE: Solr 4.3.1 memory swapping

Posted by Darrell Burgan <Da...@infor.com>.
Thanks for the advice Shawn - gives me a direction to head. My next step is probably to update the operating system and the JVM to see if the behavior changes. If not, I'll pull in Red Hat support.
Thanks,
Darrell


-----Original Message-----
From: Shawn Heisey [mailto:solr@elyograg.org] 
Sent: Thursday, March 27, 2014 2:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.1 memory swapping

On 3/26/2014 10:26 PM, Darrell Burgan wrote:
> Okay well it didn't take long for the swapping to start happening on one of our nodes.  Here is a screen shot of the Solr console:
> 
> https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png
> 
> And here is a shot of top, with processes sorted by VIRT:
> 
> https://s3-us-west-2.amazonaws.com/panswers-darrell/top.png
> 
> As shown, we have used up more than 25% of the swap space, over 1GB, even though there is 16GB of OS RAM available, and the Solr JVM has been allocated only 10GB. Further, we're only consuming 1.5/4GB of the 10GB of JVM heap.
> 
> Top shows that the Solr process 21582 is using 2.4GB resident but has a virtual size of 82.4GB. Presumably that virtual size is due to the memory mapped file. The other Java process 27619 is Zookeeper.
> 
> So my question remains - why did we use any swap space at all? Doesn't 
> seem like we're experiencing memory pressure at the moment ... I'm 
> confused.  :-)

The virtual memory value is indeed that large because of the mmapped file.

There is definitely something wrong here.  I don't know whether it's Java, RHEL, or something strange with the S3 virtual machine, possibly a bad interaction with the older kernel.  With your -Xmx value, Java should never use more than about 10.5 GB of physical memory, and the top output indicates that it's only using 2.4GB of memory.  13GB is used by the OS disk cache.

You might notice that I'm not mentioning Solr in the list of possible problems.  This is because an unmodified Solr install only utilizes the Java heap, so it's Java that is in charge of allocating memory from the operating system.

Here is a script that will tell you what's using swap and how much.
This will let you be absolutely sure about whether or not Java is the problem child:

http://stackoverflow.com/a/7180078/2665648

There are instructions in the comments of the script for sorting the output.

The only major thing I saw in your JVM config (aside from perhaps reducing the max heap) that I would change is the garbage collector tuning.  I'm the original author mentioned in this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

----------------

Here's a screenshot from my dev solr server, where you can see that there is zero swap usage:

https://www.dropbox.com/s/mftgi3q2hn7w9qp/solr-centos6-top.png

This is a baremetal server with 16GB of RAM, running CentOS 6.5 and a pre-release snapshot of Solr 4.7.1.  With an Intel Xeon X3430, I'm pretty sure the processor architecture is NUMA, but the motherboard only has one CPU slot, so it's only got one NUMA node.  As you can see by my virtual memory value, I have a lot more index data on this machine than you have on yours.  My heap is 7GB.  The other three java processes that you can see running are in-house software related to Solr.

Performance is fairly slow with that much index and so little disk cache, but it's a dev server.  The production environment has plenty of RAM to cache the entire index.

Thanks,
Shawn


Re: Solr 4.3.1 memory swapping

Posted by Shawn Heisey <so...@elyograg.org>.
On 3/26/2014 10:26 PM, Darrell Burgan wrote:
> Okay well it didn't take long for the swapping to start happening on one of our nodes.  Here is a screen shot of the Solr console:
> 
> https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png
> 
> And here is a shot of top, with processes sorted by VIRT:
> 
> https://s3-us-west-2.amazonaws.com/panswers-darrell/top.png
> 
> As shown, we have used up more than 25% of the swap space, over 1GB, even though there is 16GB of OS RAM available, and the Solr JVM has been allocated only 10GB. Further, we're only consuming 1.5/4GB of the 10GB of JVM heap.
> 
> Top shows that the Solr process 21582 is using 2.4GB resident but has a virtual size of 82.4GB. Presumably that virtual size is due to the memory mapped file. The other Java process 27619 is Zookeeper.
> 
> So my question remains - why did we use any swap space at all? Doesn't seem like we're experiencing memory pressure at the moment ... I'm confused.  :-)

The virtual memory value is indeed that large because of the mmapped file.

There is definitely something wrong here.  I don't know whether it's
Java, RHEL, or something strange with the S3 virtual machine, possibly a
bad interaction with the older kernel.  With your -Xmx value, Java
should never use more than about 10.5 GB of physical memory, and the top
output indicates that it's only using 2.4GB of memory.  13GB is used by
the OS disk cache.

You might notice that I'm not mentioning Solr in the list of possible
problems.  This is because an unmodified Solr install only utilizes the
Java heap, so it's Java that is in charge of allocating memory from the
operating system.

Here is a script that will tell you what's using swap and how much.
This will let you be absolutely sure about whether or not Java is the
problem child:

http://stackoverflow.com/a/7180078/2665648

There are instructions in the comments of the script for sorting the output.

The only major thing I saw in your JVM config (aside from perhaps
reducing the max heap) that I would change is the garbage collector
tuning.  I'm the original author mentioned in this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

----------------

Here's a screenshot from my dev solr server, where you can see that
there is zero swap usage:

https://www.dropbox.com/s/mftgi3q2hn7w9qp/solr-centos6-top.png

This is a baremetal server with 16GB of RAM, running CentOS 6.5 and a
pre-release snapshot of Solr 4.7.1.  With an Intel Xeon X3430, I'm
pretty sure the processor architecture is NUMA, but the motherboard only
has one CPU slot, so it's only got one NUMA node.  As you can see by my
virtual memory value, I have a lot more index data on this machine than
you have on yours.  My heap is 7GB.  The other three java processes that
you can see running are in-house software related to Solr.

Performance is fairly slow with that much index and so little disk
cache, but it's a dev server.  The production environment has plenty of
RAM to cache the entire index.

Thanks,
Shawn


RE: Solr 4.3.1 memory swapping

Posted by Darrell Burgan <Da...@infor.com>.
Okay well it didn't take long for the swapping to start happening on one of our nodes.  Here is a screen shot of the Solr console:

https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png

And here is a shot of top, with processes sorted by VIRT:

https://s3-us-west-2.amazonaws.com/panswers-darrell/top.png

As shown, we have used up more than 25% of the swap space, over 1GB, even though there is 16GB of OS RAM available, and the Solr JVM has been allocated only 10GB. Further, we're only consuming 1.5/4GB of the 10GB of JVM heap.

Top shows that the Solr process 21582 is using 2.4GB resident but has a virtual size of 82.4GB. Presumably that virtual size is due to the memory mapped file. The other Java process 27619 is Zookeeper.

So my question remains - why did we use any swap space at all? Doesn't seem like we're experiencing memory pressure at the moment ... I'm confused.  :-)

Thanks!
Darrell



-----Original Message-----
From: Darrell Burgan [mailto:Darrell.Burgan@infor.com] 
Sent: Wednesday, March 26, 2014 10:45 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.3.1 memory swapping

Okay I'll post some shots somewhere people can get to them to demonstrate what I'm seeing. Unfortunately I just deployed some unrelated stuff to Solr that caused me to restart each node in the SolrCloud cluster. So right now the swap usage is minimal. I'll let it grow for a few days then send some URLs to the list.

BTW, we're running RHEL 5.9 (Tikanga) and uname -a reports:

Linux da-pans-xxx 2.6.18-348.12.1.el5 #1 SMP Mon Jul 1 17:54:12 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

Thanks!
Darrell



-----Original Message-----
From: Shawn Heisey [mailto:solr@elyograg.org]
Sent: Wednesday, March 26, 2014 8:14 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.3.1 memory swapping

> Thanks - we're currently running Solr inside of RHEL virtual machines 
> inside of VMware. Running "numactl --hardware" inside the VM shows the
> following:
>
> available: 1 nodes (0)
> node 0 size: 16139 MB
> node 0 free: 364 MB
> node distances:
> node   0
>   0:  10
>
> So there is only one node being shown.  So there is only one node and 
> only one memory bank.  Am I correct in assuming that means NUMA can't 
> be the issue?
>
> My best guess as to what is going on relates to that big memory-mapped 
> file Solr allocates. Our search index is about 60GB or so, much bigger 
> than the 16GB RAM the operating system has to work with. Could it be 
> that the swapping is due to the memory-mapped file in some way?

If mmap is leading to swapping, that's a serious operating system glitch.
That's not supposed to happen. The numa idea is the only thing I know about that could cause this to happen, assuming that there's not something else on the system that's using memory.

If you could run top, press shift-M to sort by memory, and the get a screenshot, that would be good. Be sure the terminal has enough height that we can see quite a few of the top entries.

Thanks,
Shawn




RE: Solr 4.3.1 memory swapping

Posted by Darrell Burgan <Da...@infor.com>.
Okay I'll post some shots somewhere people can get to them to demonstrate what I'm seeing. Unfortunately I just deployed some unrelated stuff to Solr that caused me to restart each node in the SolrCloud cluster. So right now the swap usage is minimal. I'll let it grow for a few days then send some URLs to the list.

BTW, we're running RHEL 5.9 (Tikanga) and uname -a reports:

Linux da-pans-xxx 2.6.18-348.12.1.el5 #1 SMP Mon Jul 1 17:54:12 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

Thanks!
Darrell



-----Original Message-----
From: Shawn Heisey [mailto:solr@elyograg.org] 
Sent: Wednesday, March 26, 2014 8:14 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.3.1 memory swapping

> Thanks - we're currently running Solr inside of RHEL virtual machines 
> inside of VMware. Running "numactl --hardware" inside the VM shows the
> following:
>
> available: 1 nodes (0)
> node 0 size: 16139 MB
> node 0 free: 364 MB
> node distances:
> node   0
>   0:  10
>
> So there is only one node being shown.  So there is only one node and 
> only one memory bank.  Am I correct in assuming that means NUMA can't 
> be the issue?
>
> My best guess as to what is going on relates to that big memory-mapped 
> file Solr allocates. Our search index is about 60GB or so, much bigger 
> than the 16GB RAM the operating system has to work with. Could it be 
> that the swapping is due to the memory-mapped file in some way?

If mmap is leading to swapping, that's a serious operating system glitch.
That's not supposed to happen. The numa idea is the only thing I know about that could cause this to happen, assuming that there's not something else on the system that's using memory.

If you could run top, press shift-M to sort by memory, and the get a screenshot, that would be good. Be sure the terminal has enough height that we can see quite a few of the top entries.

Thanks,
Shawn




RE: Solr 4.3.1 memory swapping

Posted by Shawn Heisey <so...@elyograg.org>.
> Thanks - we're currently running Solr inside of RHEL virtual machines
> inside of VMware. Running "numactl --hardware" inside the VM shows the
> following:
>
> available: 1 nodes (0)
> node 0 size: 16139 MB
> node 0 free: 364 MB
> node distances:
> node   0
>   0:  10
>
> So there is only one node being shown.  So there is only one node and only
> one memory bank.  Am I correct in assuming that means NUMA can't be the
> issue?
>
> My best guess as to what is going on relates to that big memory-mapped
> file Solr allocates. Our search index is about 60GB or so, much bigger
> than the 16GB RAM the operating system has to work with. Could it be that
> the swapping is due to the memory-mapped file in some way?

If mmap is leading to swapping, that's a serious operating system glitch.
That's not supposed to happen. The numa idea is the only thing I know
about that could cause this to happen, assuming that there's not something
else on the system that's using memory.

If you could run top, press shift-M to sort by memory, and the get a
screenshot, that would be good. Be sure the terminal has enough height
that we can see quite a few of the top entries.

Thanks,
Shawn




RE: Solr 4.3.1 memory swapping

Posted by Darrell Burgan <Da...@infor.com>.
Thanks - we're currently running Solr inside of RHEL virtual machines inside of VMware. Running "numactl --hardware" inside the VM shows the following:

available: 1 nodes (0)
node 0 size: 16139 MB
node 0 free: 364 MB
node distances:
node   0 
  0:  10

So there is only one node being shown.  So there is only one node and only one memory bank.  Am I correct in assuming that means NUMA can't be the issue?

My best guess as to what is going on relates to that big memory-mapped file Solr allocates. Our search index is about 60GB or so, much bigger than the 16GB RAM the operating system has to work with. Could it be that the swapping is due to the memory-mapped file in some way?


-----Original Message-----
From: Lan [mailto:dung.lan@gmail.com] 
Sent: Wednesday, March 26, 2014 12:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.1 memory swapping

It could be related to NUMA.

Check out this article about it which has some fixes that worked for me.

http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-tp4126641p4127191.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.3.1 memory swapping

Posted by Lan <du...@gmail.com>.
It could be related to NUMA.

Check out this article about it which has some fixes that worked for me.

http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-tp4126641p4127191.html
Sent from the Solr - User mailing list archive at Nabble.com.