You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Kommu, Vinodh K." <vk...@dtcc.com> on 2020/05/21 12:17:10 UTC

How to restore deleted collection from filesystem

Hi,

One of our largest collection which holds 3.2 billion docs was deleted accidentally in QA environment. Unfortunately we don't have latest solr backup for this collection either to restore. The only option left for us is to restore deleted replica directories under data directory using netbackup restore process.

We haven't done this way of restore before so following things are not clear:

1. As the collection was deleted (not created yet), if the necessary replica directories and files are restore to same location, will the collection works without creating it again?
2. If above option doesn't work, obviously we have to create collection but the replica names and placement may not be same as deleted collection's replica names and placements (creating collections using rule based replicas) so in this case what need to be done to restore the collection smoothly. Or is there any predefined steps available to handle this kind of scenario? Any suggestions is greatly appreciated.


Thanks & Regards,
Vinodh

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

RE: How to restore deleted collection from filesystem

Posted by "Kommu, Vinodh K." <vk...@dtcc.com>.
Thanks Eric.

We were able successfully restored deleted collection data as suggested. In fact tried both approaches as below & both worked fine:

1) Create collection with same number of shards and replication factor = 1
2) Create collection with same number of shards and same replication factor as deleted collection.

As we create collections using rule-based replica placement method, first approach is little difficult to find which on node the replicas should be added manually. With 2nd approach, as the replicas are already created, just copied shard1 leader's index files from restored data to all corresponding shard1 replicas index directory on newly created collection. Once copy is done, brought up solr nodes and everything was working fine.


Thanks & Regards,
Vinodh

-----Original Message-----
From: Erick Erickson <er...@gmail.com> 
Sent: Thursday, May 21, 2020 11:09 PM
To: solr-user@lucene.apache.org
Subject: Re: How to restore deleted collection from filesystem

ATTENTION: External Email – Be Suspicious of Attachments, Links and Requests for Login Information.

See inline.

> On May 21, 2020, at 10:13 AM, Kommu, Vinodh K. <vk...@dtcc.com> wrote:
>
> Thanks Eric for quick response.
>
> Yes, our VMs are equipped with NetBackup which is like file based backup and it can restore any files or directories that were deleted from latest available full backup.
>
> Can we create an empty collection with the same name which was deleted with same number of shared & replicas and copy the content from restored core to corresponding core?

Kind of. It is NOT necessary that it has the same name. There is no need at all (and I do NOT recommend) that you create the same number of replicas to start. As I said earlier, create a single-replica (i.e. leader-only) collection with the same number of shards. Copy _one_ data dir (not everything under core) to that _one_ corresponding replica. It doesn’t matter which replica you copy from

> I mean, copy all contents (directories & files) under Oldcollection_shard1_replica1 core from old collection to corresponding Newcollection_shard1_replica1 core in new collection. Would this approach will work?
>

As above, do not do this. Just copy the data dir from one of your backup copies to the leader-only replica. It doesn’t matter at all if the replica names are the same. The only thing that matters is that the shard number is identical. For instance, copy blah/blah/collection1_shard1_replica_57/data to blah/blah/collection1_shared1_replica_1/data if you want.

Once you have a one-replica collection with the data in it and you’ve done a bit of verification, use ADDREPLICA to build it out.

> Lastly anything needs to be aware in core.properties in newly created collection or any reference pointing to new collection specific?

Do not copy or touch  core.properties, you can mess this up thoroughly by hand-editing. The _only_ thing you copy is the data directory, which will contain a tlog and index directory. And, the tlog isn’t even necessary.

Best,
Erick

>
>
> Thanks & Regards,
> Vinodh
>
> -----Original Message-----
> From: Erick Erickson <er...@gmail.com>
> Sent: Thursday, May 21, 2020 6:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to restore deleted collection from filesystem
>
> ATTENTION: External Email – Be Suspicious of Attachments, Links and Requests for Login Information.
>
> So what I’m reading here is that you have the _data_ saved somewhere, right? By “data” I just mean the data directories under the replica.
>
> 1> Go ahead and recreate the collection. It _must_ have the same number of shards. Make it leader-only, i.e. replicationFactor == 1
> 2> The collection will be empty, now shut down the Solr instances hosting any of the replicas.
> 3> Replace the data directory under each replica with the corresponding one from the backup. “Corresponding” means from the same shard, which should be obvious from the replica name.
> 4> Start your Solr instances back up and verify it’s as you expect.
> 5> Use ADDREPLICA to build out your collection to have as many replicas of each shard as you require. NOTE: I’d do this gradually, maybe 2-3 at a time then wait for them to become active before adding more. The point here is that each ADDREPLICA will cause the entire index down from from the leader and with that many documents you don’t want to saturate your network.
>
> Best,
> Erick
>
>> On May 21, 2020, at 8:17 AM, Kommu, Vinodh K. <vk...@dtcc.com> wrote:
>>
>> Hi,
>>
>> One of our largest collection which holds 3.2 billion docs was deleted accidentally in QA environment. Unfortunately we don't have latest solr backup for this collection either to restore. The only option left for us is to restore deleted replica directories under data directory using netbackup restore process.
>>
>> We haven't done this way of restore before so following things are not clear:
>>
>> 1. As the collection was deleted (not created yet), if the necessary replica directories and files are restore to same location, will the collection works without creating it again?
>> 2. If above option doesn't work, obviously we have to create collection but the replica names and placement may not be same as deleted collection's replica names and placements (creating collections using rule based replicas) so in this case what need to be done to restore the collection smoothly. Or is there any predefined steps available to handle this kind of scenario? Any suggestions is greatly appreciated.
>>
>>
>> Thanks & Regards,
>> Vinodh
>>
>> DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
>
> DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

Re: How to restore deleted collection from filesystem

Posted by Erick Erickson <er...@gmail.com>.
See inline.

> On May 21, 2020, at 10:13 AM, Kommu, Vinodh K. <vk...@dtcc.com> wrote:
> 
> Thanks Eric for quick response.
> 
> Yes, our VMs are equipped with NetBackup which is like file based backup and it can restore any files or directories that were deleted from latest available full backup.
> 
> Can we create an empty collection with the same name which was deleted with same number of shared & replicas and copy the content from restored core to corresponding core?

Kind of. It is NOT necessary that it has the same name. There is no need at all (and I do NOT recommend) that you create the same number of replicas to start. As I said earlier, create a single-replica (i.e. leader-only) collection with the same number of shards. Copy _one_ data dir (not everything under core) to that _one_ corresponding replica. It doesn’t matter which replica you copy from

> I mean, copy all contents (directories & files) under Oldcollection_shard1_replica1 core from old collection to corresponding Newcollection_shard1_replica1 core in new collection. Would this approach will work?
> 

As above, do not do this. Just copy the data dir from one of your backup copies to the leader-only replica. It doesn’t matter at all if the replica names are the same. The only thing that matters is that the shard number is identical. For instance, copy blah/blah/collection1_shard1_replica_57/data to blah/blah/collection1_shared1_replica_1/data if you want.

Once you have a one-replica collection with the data in it and you’ve done a bit of verification, use ADDREPLICA to build it out.

> Lastly anything needs to be aware in core.properties in newly created collection or any reference pointing to new collection specific?

Do not copy or touch  core.properties, you can mess this up thoroughly by hand-editing. The _only_ thing you copy is the data directory, which will contain a tlog and index directory. And, the tlog isn’t even necessary.

Best,
Erick

> 
> 
> Thanks & Regards,
> Vinodh
> 
> -----Original Message-----
> From: Erick Erickson <er...@gmail.com> 
> Sent: Thursday, May 21, 2020 6:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to restore deleted collection from filesystem
> 
> ATTENTION: External Email – Be Suspicious of Attachments, Links and Requests for Login Information.
> 
> So what I’m reading here is that you have the _data_ saved somewhere, right? By “data” I just mean the data directories under the replica.
> 
> 1> Go ahead and recreate the collection. It _must_ have the same number of shards. Make it leader-only, i.e. replicationFactor == 1
> 2> The collection will be empty, now shut down the Solr instances hosting any of the replicas.
> 3> Replace the data directory under each replica with the corresponding one from the backup. “Corresponding” means from the same shard, which should be obvious from the replica name.
> 4> Start your Solr instances back up and verify it’s as you expect.
> 5> Use ADDREPLICA to build out your collection to have as many replicas of each shard as you require. NOTE: I’d do this gradually, maybe 2-3 at a time then wait for them to become active before adding more. The point here is that each ADDREPLICA will cause the entire index down from from the leader and with that many documents you don’t want to saturate your network.
> 
> Best,
> Erick
> 
>> On May 21, 2020, at 8:17 AM, Kommu, Vinodh K. <vk...@dtcc.com> wrote:
>> 
>> Hi,
>> 
>> One of our largest collection which holds 3.2 billion docs was deleted accidentally in QA environment. Unfortunately we don't have latest solr backup for this collection either to restore. The only option left for us is to restore deleted replica directories under data directory using netbackup restore process.
>> 
>> We haven't done this way of restore before so following things are not clear:
>> 
>> 1. As the collection was deleted (not created yet), if the necessary replica directories and files are restore to same location, will the collection works without creating it again?
>> 2. If above option doesn't work, obviously we have to create collection but the replica names and placement may not be same as deleted collection's replica names and placements (creating collections using rule based replicas) so in this case what need to be done to restore the collection smoothly. Or is there any predefined steps available to handle this kind of scenario? Any suggestions is greatly appreciated.
>> 
>> 
>> Thanks & Regards,
>> Vinodh
>> 
>> DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.


RE: How to restore deleted collection from filesystem

Posted by "Kommu, Vinodh K." <vk...@dtcc.com>.
Thanks Eric for quick response.

Yes, our VMs are equipped with NetBackup which is like file based backup and it can restore any files or directories that were deleted from latest available full backup.

Can we create an empty collection with the same name which was deleted with same number of shared & replicas and copy the content from restored core to corresponding core? I mean, copy all contents (directories & files) under Oldcollection_shard1_replica1 core from old collection to corresponding Newcollection_shard1_replica1 core in new collection. Would this approach will work?

Lastly anything needs to be aware in core.properties in newly created collection or any reference pointing to new collection specific?


Thanks & Regards,
Vinodh

-----Original Message-----
From: Erick Erickson <er...@gmail.com> 
Sent: Thursday, May 21, 2020 6:17 PM
To: solr-user@lucene.apache.org
Subject: Re: How to restore deleted collection from filesystem

ATTENTION: External Email – Be Suspicious of Attachments, Links and Requests for Login Information.

So what I’m reading here is that you have the _data_ saved somewhere, right? By “data” I just mean the data directories under the replica.

1> Go ahead and recreate the collection. It _must_ have the same number of shards. Make it leader-only, i.e. replicationFactor == 1
2> The collection will be empty, now shut down the Solr instances hosting any of the replicas.
3> Replace the data directory under each replica with the corresponding one from the backup. “Corresponding” means from the same shard, which should be obvious from the replica name.
4> Start your Solr instances back up and verify it’s as you expect.
5> Use ADDREPLICA to build out your collection to have as many replicas of each shard as you require. NOTE: I’d do this gradually, maybe 2-3 at a time then wait for them to become active before adding more. The point here is that each ADDREPLICA will cause the entire index down from from the leader and with that many documents you don’t want to saturate your network.

Best,
Erick

> On May 21, 2020, at 8:17 AM, Kommu, Vinodh K. <vk...@dtcc.com> wrote:
>
> Hi,
>
> One of our largest collection which holds 3.2 billion docs was deleted accidentally in QA environment. Unfortunately we don't have latest solr backup for this collection either to restore. The only option left for us is to restore deleted replica directories under data directory using netbackup restore process.
>
> We haven't done this way of restore before so following things are not clear:
>
> 1. As the collection was deleted (not created yet), if the necessary replica directories and files are restore to same location, will the collection works without creating it again?
> 2. If above option doesn't work, obviously we have to create collection but the replica names and placement may not be same as deleted collection's replica names and placements (creating collections using rule based replicas) so in this case what need to be done to restore the collection smoothly. Or is there any predefined steps available to handle this kind of scenario? Any suggestions is greatly appreciated.
>
>
> Thanks & Regards,
> Vinodh
>
> DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

Re: How to restore deleted collection from filesystem

Posted by Erick Erickson <er...@gmail.com>.
So what I’m reading here is that you have the _data_ saved somewhere, right? By “data” I just mean the data directories under the replica.

1> Go ahead and recreate the collection. It _must_ have the same number of shards. Make it leader-only, i.e. replicationFactor == 1
2> The collection will be empty, now shut down the Solr instances hosting any of the replicas.
3> Replace the data directory under each replica with the corresponding one from the backup. “Corresponding” means from the same shard, which should be obvious from the replica name.
4> Start your Solr instances back up and verify it’s as you expect.
5> Use ADDREPLICA to build out your collection to have as many replicas of each shard as you require. NOTE: I’d do this gradually, maybe 2-3 at a time then wait for them to become active before adding more. The point here is that each ADDREPLICA will cause the entire index down from from the leader and with that many documents you don’t want to saturate your network.

Best,
Erick

> On May 21, 2020, at 8:17 AM, Kommu, Vinodh K. <vk...@dtcc.com> wrote:
> 
> Hi,
> 
> One of our largest collection which holds 3.2 billion docs was deleted accidentally in QA environment. Unfortunately we don't have latest solr backup for this collection either to restore. The only option left for us is to restore deleted replica directories under data directory using netbackup restore process.
> 
> We haven't done this way of restore before so following things are not clear:
> 
> 1. As the collection was deleted (not created yet), if the necessary replica directories and files are restore to same location, will the collection works without creating it again?
> 2. If above option doesn't work, obviously we have to create collection but the replica names and placement may not be same as deleted collection's replica names and placements (creating collections using rule based replicas) so in this case what need to be done to restore the collection smoothly. Or is there any predefined steps available to handle this kind of scenario? Any suggestions is greatly appreciated.
> 
> 
> Thanks & Regards,
> Vinodh
> 
> DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.