You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Saksham Gupta <sa...@indiamart.com.INVALID> on 2023/06/27 12:27:43 UTC

Solr Cloud Backup Strategy and Data Corruption Prevention

Hi Solr Developers,
Reaching out to inquire about the best practices for implementing a backup
strategy in Solr Cloud. We recently migrated from Solr standalone (solr6.5)
to Solr 8.10, where we have a collection with data divided among 8 shards
using implicit routing. Until now, we have maintained the standalone solr
as a backup in case something goes wrong on solr cloud (due to data
corruption/ deletion, etc.).
However, we now wish to discard the standalone Solr and fully transition to
Solr Cloud. My concern is what would happen if the data in Solr Cloud were
to become corrupted/ deleted, necessitating the replacement or reindexing
of the entire dataset, which can be a time-consuming process. We aim to
minimize downtime as much as possible.
I would greatly appreciate any insights or recommendations you could
provide to address this concern.

Thank you in advance.

Best regards,
Saksham

Re: Solr Cloud Backup Strategy and Data Corruption Prevention

Posted by Saksham Gupta <sa...@indiamart.com.INVALID>.
Thanks Joe for such a detailed solution. Think this can help us with the
problem.

On Wed, Jun 28, 2023 at 1:47 PM Joe Jones (DHCW - Software Development)
<Jo...@wales.nhs.uk.invalid> wrote:

> For our small (50million document) 12 shard real-time index we backup each
> node every night and perform an integrity check on it.
>
> We run a simple batch file (Windows) to loop through the environments and
> generate CURL calls to instigate the backup process such as:
> http://localhost:18983/solr/wcrs/replication?command=backup&location=D
> :\Solr\backup\node1&name=bak
>
> And at a later point we integrity check with another script which calls:
> java -cp 'lucene-core-9.3.0.jar;lucene-backward-codecs-9.3.0.jar'
> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
> D:\\Solr\\backup\\node1\\snapshot.bak
>
> The backup is essentially Solr replicating the indexes to another data
> directory, and then our organisations backup scheduling backs up the data
> each night for however long we set it to roll over for.....as you can
> imagine, if you have large indexes you could with rolling backups be
> storing a huge amount of data so that needs to be balanced.
>
> -----Original Message-----
> From: Saksham Gupta <sa...@indiamart.com.INVALID>
> Sent: 28 June 2023 06:32
> To: users@solr.apache.org
> Subject: Re: Solr Cloud Backup Strategy and Data Corruption Prevention
>
> WARNING: This email originated from outside of NHS Wales. Do not open
> links or attachments unless you know the content is safe.
>
>
> Hi All,
> Any help regarding this problem. What is the standard practice to create
> backup on solr cloud?
>
> On Tue, Jun 27, 2023 at 5:57 PM Saksham Gupta <saksham.gupta@indiamart.com
> >
> wrote:
>
> > Hi Solr Developers,
> > Reaching out to inquire about the best practices for implementing a
> > backup strategy in Solr Cloud. We recently migrated from Solr
> > standalone (solr6.5) to Solr 8.10, where we have a collection with
> > data divided among 8 shards using implicit routing. Until now, we have
> > maintained the standalone solr as a backup in case something goes
> > wrong on solr cloud (due to data corruption/ deletion, etc.).
> > However, we now wish to discard the standalone Solr and fully
> > transition to Solr Cloud. My concern is what would happen if the data
> > in Solr Cloud were to become corrupted/ deleted, necessitating the
> > replacement or reindexing of the entire dataset, which can be a
> > time-consuming process. We aim to minimize downtime as much as possible.
> > I would greatly appreciate any insights or recommendations you could
> > provide to address this concern.
> >
> > Thank you in advance.
> >
> > Best regards,
> > Saksham
> >
> Rydym yn croesawu derbyn gohebiaeth yng Nghymraeg. Byddwn yn ateb y fath
> ohebiaeth yng Nghymraeg ac ni fydd hyn yn arwain at oedi.
> We welcome receiving correspondence in Welsh. We will reply to such
> correspondence in Welsh and this will not lead to a delay.
>

RE: Solr Cloud Backup Strategy and Data Corruption Prevention

Posted by "Joe Jones (DHCW - Software Development)" <Jo...@wales.nhs.uk.INVALID>.
For our small (50million document) 12 shard real-time index we backup each node every night and perform an integrity check on it.

We run a simple batch file (Windows) to loop through the environments and generate CURL calls to instigate the backup process such as:
http://localhost:18983/solr/wcrs/replication?command=backup&location=D:\Solr\backup\node1&name=bak

And at a later point we integrity check with another script which calls:
java -cp 'lucene-core-9.3.0.jar;lucene-backward-codecs-9.3.0.jar' -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex D:\\Solr\\backup\\node1\\snapshot.bak

The backup is essentially Solr replicating the indexes to another data directory, and then our organisations backup scheduling backs up the data each night for however long we set it to roll over for.....as you can imagine, if you have large indexes you could with rolling backups be storing a huge amount of data so that needs to be balanced.

-----Original Message-----
From: Saksham Gupta <sa...@indiamart.com.INVALID>
Sent: 28 June 2023 06:32
To: users@solr.apache.org
Subject: Re: Solr Cloud Backup Strategy and Data Corruption Prevention

WARNING: This email originated from outside of NHS Wales. Do not open links or attachments unless you know the content is safe.


Hi All,
Any help regarding this problem. What is the standard practice to create backup on solr cloud?

On Tue, Jun 27, 2023 at 5:57 PM Saksham Gupta <sa...@indiamart.com>
wrote:

> Hi Solr Developers,
> Reaching out to inquire about the best practices for implementing a
> backup strategy in Solr Cloud. We recently migrated from Solr
> standalone (solr6.5) to Solr 8.10, where we have a collection with
> data divided among 8 shards using implicit routing. Until now, we have
> maintained the standalone solr as a backup in case something goes
> wrong on solr cloud (due to data corruption/ deletion, etc.).
> However, we now wish to discard the standalone Solr and fully
> transition to Solr Cloud. My concern is what would happen if the data
> in Solr Cloud were to become corrupted/ deleted, necessitating the
> replacement or reindexing of the entire dataset, which can be a
> time-consuming process. We aim to minimize downtime as much as possible.
> I would greatly appreciate any insights or recommendations you could
> provide to address this concern.
>
> Thank you in advance.
>
> Best regards,
> Saksham
>
Rydym yn croesawu derbyn gohebiaeth yng Nghymraeg. Byddwn yn ateb y fath ohebiaeth yng Nghymraeg ac ni fydd hyn yn arwain at oedi.
We welcome receiving correspondence in Welsh. We will reply to such correspondence in Welsh and this will not lead to a delay.

Re: Solr Cloud Backup Strategy and Data Corruption Prevention

Posted by Saksham Gupta <sa...@indiamart.com.INVALID>.
Hi All,
Any help regarding this problem. What is the standard practice to create
backup on solr cloud?

On Tue, Jun 27, 2023 at 5:57 PM Saksham Gupta <sa...@indiamart.com>
wrote:

> Hi Solr Developers,
> Reaching out to inquire about the best practices for implementing a backup
> strategy in Solr Cloud. We recently migrated from Solr standalone (solr6.5)
> to Solr 8.10, where we have a collection with data divided among 8 shards
> using implicit routing. Until now, we have maintained the standalone solr
> as a backup in case something goes wrong on solr cloud (due to data
> corruption/ deletion, etc.).
> However, we now wish to discard the standalone Solr and fully transition
> to Solr Cloud. My concern is what would happen if the data in Solr Cloud
> were to become corrupted/ deleted, necessitating the replacement or
> reindexing of the entire dataset, which can be a time-consuming process. We
> aim to minimize downtime as much as possible.
> I would greatly appreciate any insights or recommendations you could
> provide to address this concern.
>
> Thank you in advance.
>
> Best regards,
> Saksham
>