You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paweł Róg <pr...@gmail.com> on 2021/01/20 11:59:47 UTC

Solr Cloud freezes during scheduled backup

Hello everyone,
I have a nasty problem with the scheduled Solr collections backup. From
time to time when a scheduled backup is triggered (backup operation takes
around 10 minutes) Solr freezes for 20-30 seconds. The freeze happens on
one Solr instance at time but this affects all queries latency (because of
distributed queries on 6 shards). I can reproduce the problem only when
updates in the Solr cluster are enabled. When I disable updates, the
problem is gone.

Lucene index is not big and fits into OS cache. I am wondering if taking a
backup can be the culprit of the problem. I'm wondering if the process
messes up operating system caches. Maybe all the files which are copied to
NFS are eating up the OS cache and when the OS reaches high memory usage it
starts cleaning up memory and making Solr to freeze.

During the time of freeze monitoring charts are showing higher IO wait
times. In addition to that Solr nodes which seem to be affected are
reaching 95-100% total memory usage (used + buffers + caches).

I cannot see anything valuable in GC logs apart from a message which
suggests that the application was stopped for 20-30 seconds (Application
time).

The cluster consists of 12 machines. Each Solr is running on Ubuntu 16.04.
All the servers are running in AWS EC2. Each Solr node is running inside
Docker. EC2 instances have local SSD disks (but the same problem appeared
with EBS).

Does anyone have a similar problem and can share some thoughts? I'll
appreciate all help.

--
Pawel Rog

Re: Solr Cloud freezes during scheduled backup

Posted by Jason Gerlowski <ge...@gmail.com>.
Hi Pawel,

This definitely sounds like garbage collection biting you.

Backups themselves aren't usually memory intensive, but if indexing is
going on at the same time you should expect elevated memory usage.
Essentially this is because for each core being backed up, Solr needs
to hold pieces of two different "versions" of the index in memory: the
commit-point being backed up, and the current state of the index with
the new documents.

If disabling indexing during backups is feasible that's where I'd
start in your shoes.  If it's not you might need to consider tweaks to
your heap and JVM GC settings to shorten the long individual GC pauses
you're reporting.

Good luck,

Jason

On Wed, Jan 20, 2021 at 7:00 AM Paweł Róg <pr...@gmail.com> wrote:
>
> Hello everyone,
> I have a nasty problem with the scheduled Solr collections backup. From
> time to time when a scheduled backup is triggered (backup operation takes
> around 10 minutes) Solr freezes for 20-30 seconds. The freeze happens on
> one Solr instance at time but this affects all queries latency (because of
> distributed queries on 6 shards). I can reproduce the problem only when
> updates in the Solr cluster are enabled. When I disable updates, the
> problem is gone.
>
> Lucene index is not big and fits into OS cache. I am wondering if taking a
> backup can be the culprit of the problem. I'm wondering if the process
> messes up operating system caches. Maybe all the files which are copied to
> NFS are eating up the OS cache and when the OS reaches high memory usage it
> starts cleaning up memory and making Solr to freeze.
>
> During the time of freeze monitoring charts are showing higher IO wait
> times. In addition to that Solr nodes which seem to be affected are
> reaching 95-100% total memory usage (used + buffers + caches).
>
> I cannot see anything valuable in GC logs apart from a message which
> suggests that the application was stopped for 20-30 seconds (Application
> time).
>
> The cluster consists of 12 machines. Each Solr is running on Ubuntu 16.04.
> All the servers are running in AWS EC2. Each Solr node is running inside
> Docker. EC2 instances have local SSD disks (but the same problem appeared
> with EBS).
>
> Does anyone have a similar problem and can share some thoughts? I'll
> appreciate all help.
>
> --
> Pawel Rog