You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Steffen Moldenhauer <s....@intershop.de.INVALID> on 2023/03/13 11:59:23 UTC

Solr Operator Solr Pod slow restart cp-solr-xml time

Hi,

we're using the Solr Operator 0.6.0
In some of our deployments a restart of Solr Pods will take minutes up to half an hour.
The init container cp-solr-xml with the command
cp /tmp/solr.xml /tmp-config/solr.xml && chown -R 8983:8983
         /var/solr/data/backup-restore/local-collection-backups-1
esp. the recursive chown seems to be the cause.  We mitigated it a bit by cleaning up the backup volume regularly. But there are deployments with a larger number of collections (~400) and then it is still slow.

How can we avoid this or how can we speed that up? Would a  find -not -user X -not -group X | xargs chown make sense/a difference here?

Regards
Steffen Moldenhauer

Re: Solr Operator Solr Pod slow restart cp-solr-xml time

Posted by Shawn Heisey <el...@elyograg.org>.

On 3/14/23 08:45, Steffen Moldenhauer wrote:
> It is a azurefile-csi storage.  I think that's some kind of SMB. Yes that's slow (and cheapest) but sufficed for the requirements for the backup performance so far.
> There is round about a daily backup for all collections kept for 7 days. So it's ~ 7x400 backups.

If Solr is the only thing writing to the backup location, and all your 
Solr instances are using 8983:8983 for uid:gid, then my opinion is that 
you comment/remove the chown from the startup, as everything should be 
correct as far as permissions go and further chowns should be unnecessary.

If you have something other than Solr that is reading the backups that 
uses a different UID than 8983, then you'll need to make sure that it 
has correct permissions, probably by adding it to the solr (8983) group.

Thanks,
Shawn

RE: Solr Operator Solr Pod slow restart cp-solr-xml time

Posted by Steffen Moldenhauer <s....@intershop.de.INVALID>.

It is a azurefile-csi storage.  I think that's some kind of SMB. Yes that's slow (and cheapest) but sufficed for the requirements for the backup performance so far. 
There is round about a daily backup for all collections kept for 7 days. So it's ~ 7x400 backups.

Thanks for the ideas. 
Steffen

> -----Original Message-----
> From: Shawn Heisey <ap...@elyograg.org>
> Sent: Dienstag, 14. März 2023 01:31
> To: users@solr.apache.org
> Subject: Re: Solr Operator Solr Pod slow restart cp-solr-xml time
> 
> On 3/13/23 05:59, Steffen Moldenhauer wrote:
> > esp. the recursive chown seems to be the cause.  We mitigated it a bit by
> cleaning up the backup volume regularly. But there are deployments with a
> larger number of collections (~400) and then it is still slow.
> 
> Matthew's question is exactly what I was thinking.  A recursive chown should
> not take that long on almost any *NIX filesystem with the directory structure
> that Solr creates, so we really want to know what kind of filesystem it's on.
> 
> If it's a network filesystem (NFS, SMB, S3, Google Drive, or similar) then that
> might take a long time, as that is the nature of a network filesystem.  Doing a
> find piped through xargs/chown is likely to take almost as long as the
> recursive chown does.
> 
> It does seem likely that you'd have backups on a network filesystem, as
> that's how the backup feature in Solr is designed to work -- the same
> filesystem mounted in the same place on all Solr nodes.
> 
> Maybe the startup can be changed so it does the chown in the backround
> instead of holding up the Solr start.  I have never used the operator so I don't
> know anything about it.
> 
> Another idea is to remove the chown from the startup and have it run
> periodically on the system that shares the filesystem over the network.
> Or do the chown manually once, remove the chown from the startup, and
> just don't worry about it, because if Solr is the only thing writing to the
> backup location, everything should have correct permissions.
> 
> Thanks,
> Shawn

Re: Solr Operator Solr Pod slow restart cp-solr-xml time

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/13/23 05:59, Steffen Moldenhauer wrote:
> esp. the recursive chown seems to be the cause. We mitigated it a bit by cleaning up the backup volume regularly. But there are deployments with a larger number of collections (~400) and then it is still slow.

Matthew's question is exactly what I was thinking. A recursive chown
should not take that long on almost any *NIX filesystem with the
directory structure that Solr creates, so we really want to know what
kind of filesystem it's on.

If it's a network filesystem (NFS, SMB, S3, Google Drive, or similar)
then that might take a long time, as that is the nature of a network
filesystem. Doing a find piped through xargs/chown is likely to take
almost as long as the recursive chown does.

It does seem likely that you'd have backups on a network filesystem, as
that's how the backup feature in Solr is designed to work -- the same
filesystem mounted in the same place on all Solr nodes.

Maybe the startup can be changed so it does the chown in the backround
instead of holding up the Solr start. I have never used the operator so
I don't know anything about it.

Another idea is to remove the chown from the startup and have it run
periodically on the system that shares the filesystem over the network.
Or do the chown manually once, remove the chown from the startup, and
just don't worry about it, because if Solr is the only thing writing to
the backup location, everything should have correct permissions.

Thanks,
Shawn

Re: Solr Operator Solr Pod slow restart cp-solr-xml time

Posted by matthew sporleder <ms...@gmail.com>.

What type of filesystem is this one?  It should be a PV.

Even 400 collections shouldn't be that many files.

On Mon, Mar 13, 2023 at 7:59 AM Steffen Moldenhauer
<s....@intershop.de.invalid> wrote:
>
> Hi,
>
> we're using the Solr Operator 0.6.0
> In some of our deployments a restart of Solr Pods will take minutes up to half an hour.
> The init container cp-solr-xml with the command
> cp /tmp/solr.xml /tmp-config/solr.xml && chown -R 8983:8983
>          /var/solr/data/backup-restore/local-collection-backups-1
> esp. the recursive chown seems to be the cause.  We mitigated it a bit by cleaning up the backup volume regularly. But there are deployments with a larger number of collections (~400) and then it is still slow.
>
> How can we avoid this or how can we speed that up? Would a  find -not -user X -not -group X | xargs chown make sense/a difference here?
>
> Regards
> Steffen Moldenhauer