You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chetas Joshi <ch...@gmail.com> on 2016/11/07 20:49:41 UTC

Re-register a deleted Collection SorlCloud

I have a Solr Cloud deployed on top of HDFS.

I accidentally deleted a collection using the collection API. So, ZooKeeper
cluster has lost all the info related to that collection. I don't have a
backup that I can restore from. However, I have indices and transaction
logs on HDFS.

If I create a new collection and copy the existing data directory to the
data directory path of the new collection I have created, will I be able to
go back to the state where I was? Is there anything else I would have to do?

Thanks,

Chetas.

Re: Re-register a deleted Collection SorlCloud

Posted by Chetas Joshi <ch...@gmail.com>.
I won't be able to achieve the correct mapping as I did not store the
mapping info any where. I don't know if core-node1 was mapped to
shard1_recplica1 or shard2_replica1 in my old collection. But I am not
worried about that as I am not going to update any existing document.

 This is what I did.

I created a new collection with the same schema and the same config.
Shut the SolrCloud down.
Then I copied the data directory.


hadoop fs -cp hdfs://prod/solr53/collection_old/*
hdfs://prod/solr53/collection_new/


Re-started the SolrCloud and I could see documents in the Solr UI when I
queried using the "/select" handler.


Thanks!



On Mon, Nov 7, 2016 at 2:59 PM, Erick Erickson <er...@gmail.com>
wrote:

> You've got it. You should be quite safe if you
> 1> create the same number of shards as you used to have
> 2> match the shard bits. I.e. collection1_shard1_replica1 as long as
> the collection1_shard# parts match you should be fine. If this isn't
> done correctly, the symptom will be that when you update an existing
> document, you may have two copies returned eventually.
>
> Best,
> Erick
>
> On Mon, Nov 7, 2016 at 1:47 PM, Chetas Joshi <ch...@gmail.com>
> wrote:
> > Thanks Erick.
> >
> > I had replicationFactor=1 in my old collection and going to have the same
> > config for the new collection.
> > When I create a new collection with number of Shards =20 and max shards
> per
> > node = 1, the shards are going to start on 20 hosts out of my 25 hosts
> Solr
> > cluster. When you say "get each shard's index to the corresponding shard
> on
> > your new collection", do you mean the following?
> >
> > shard1_replica1 -> core_node1 (old collection)
> > shard1_replica1 -> has to be core_node1 (new collection) (I don't have
> this
> > mapping for the old collection as the collection no longer exists!!)
> >
> > Thanks,
> > Chetas.
> >
> > On Mon, Nov 7, 2016 at 1:03 PM, Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> That should work. The caveat here is that you need to get the each
> >> shards index to the corresponding shard on your new collection.
> >>
> >> Of course I'd back up _all_ of these indexes before even starting.
> >>
> >> And one other trick. First create your collection with 1 replica per
> >> shard (leader-only). Then copy the indexes (and, btw, I'd have the
> >> associated Solr nodes down during the copy) and verify the collection
> >> is as you'd expect.
> >>
> >> Now use ADDREPLICA to expand your collection, that'll handle the
> >> copying from the leader correctly.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Nov 7, 2016 at 12:49 PM, Chetas Joshi <ch...@gmail.com>
> >> wrote:
> >> > I have a Solr Cloud deployed on top of HDFS.
> >> >
> >> > I accidentally deleted a collection using the collection API. So,
> >> ZooKeeper
> >> > cluster has lost all the info related to that collection. I don't
> have a
> >> > backup that I can restore from. However, I have indices and
> transaction
> >> > logs on HDFS.
> >> >
> >> > If I create a new collection and copy the existing data directory to
> the
> >> > data directory path of the new collection I have created, will I be
> able
> >> to
> >> > go back to the state where I was? Is there anything else I would have
> to
> >> do?
> >> >
> >> > Thanks,
> >> >
> >> > Chetas.
> >>
>

Re: Re-register a deleted Collection SorlCloud

Posted by Erick Erickson <er...@gmail.com>.
You've got it. You should be quite safe if you
1> create the same number of shards as you used to have
2> match the shard bits. I.e. collection1_shard1_replica1 as long as
the collection1_shard# parts match you should be fine. If this isn't
done correctly, the symptom will be that when you update an existing
document, you may have two copies returned eventually.

Best,
Erick

On Mon, Nov 7, 2016 at 1:47 PM, Chetas Joshi <ch...@gmail.com> wrote:
> Thanks Erick.
>
> I had replicationFactor=1 in my old collection and going to have the same
> config for the new collection.
> When I create a new collection with number of Shards =20 and max shards per
> node = 1, the shards are going to start on 20 hosts out of my 25 hosts Solr
> cluster. When you say "get each shard's index to the corresponding shard on
> your new collection", do you mean the following?
>
> shard1_replica1 -> core_node1 (old collection)
> shard1_replica1 -> has to be core_node1 (new collection) (I don't have this
> mapping for the old collection as the collection no longer exists!!)
>
> Thanks,
> Chetas.
>
> On Mon, Nov 7, 2016 at 1:03 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> That should work. The caveat here is that you need to get the each
>> shards index to the corresponding shard on your new collection.
>>
>> Of course I'd back up _all_ of these indexes before even starting.
>>
>> And one other trick. First create your collection with 1 replica per
>> shard (leader-only). Then copy the indexes (and, btw, I'd have the
>> associated Solr nodes down during the copy) and verify the collection
>> is as you'd expect.
>>
>> Now use ADDREPLICA to expand your collection, that'll handle the
>> copying from the leader correctly.
>>
>> Best,
>> Erick
>>
>> On Mon, Nov 7, 2016 at 12:49 PM, Chetas Joshi <ch...@gmail.com>
>> wrote:
>> > I have a Solr Cloud deployed on top of HDFS.
>> >
>> > I accidentally deleted a collection using the collection API. So,
>> ZooKeeper
>> > cluster has lost all the info related to that collection. I don't have a
>> > backup that I can restore from. However, I have indices and transaction
>> > logs on HDFS.
>> >
>> > If I create a new collection and copy the existing data directory to the
>> > data directory path of the new collection I have created, will I be able
>> to
>> > go back to the state where I was? Is there anything else I would have to
>> do?
>> >
>> > Thanks,
>> >
>> > Chetas.
>>

Re: Re-register a deleted Collection SorlCloud

Posted by Chetas Joshi <ch...@gmail.com>.
Thanks Erick.

I had replicationFactor=1 in my old collection and going to have the same
config for the new collection.
When I create a new collection with number of Shards =20 and max shards per
node = 1, the shards are going to start on 20 hosts out of my 25 hosts Solr
cluster. When you say "get each shard's index to the corresponding shard on
your new collection", do you mean the following?

shard1_replica1 -> core_node1 (old collection)
shard1_replica1 -> has to be core_node1 (new collection) (I don't have this
mapping for the old collection as the collection no longer exists!!)

Thanks,
Chetas.

On Mon, Nov 7, 2016 at 1:03 PM, Erick Erickson <er...@gmail.com>
wrote:

> That should work. The caveat here is that you need to get the each
> shards index to the corresponding shard on your new collection.
>
> Of course I'd back up _all_ of these indexes before even starting.
>
> And one other trick. First create your collection with 1 replica per
> shard (leader-only). Then copy the indexes (and, btw, I'd have the
> associated Solr nodes down during the copy) and verify the collection
> is as you'd expect.
>
> Now use ADDREPLICA to expand your collection, that'll handle the
> copying from the leader correctly.
>
> Best,
> Erick
>
> On Mon, Nov 7, 2016 at 12:49 PM, Chetas Joshi <ch...@gmail.com>
> wrote:
> > I have a Solr Cloud deployed on top of HDFS.
> >
> > I accidentally deleted a collection using the collection API. So,
> ZooKeeper
> > cluster has lost all the info related to that collection. I don't have a
> > backup that I can restore from. However, I have indices and transaction
> > logs on HDFS.
> >
> > If I create a new collection and copy the existing data directory to the
> > data directory path of the new collection I have created, will I be able
> to
> > go back to the state where I was? Is there anything else I would have to
> do?
> >
> > Thanks,
> >
> > Chetas.
>

Re: Re-register a deleted Collection SorlCloud

Posted by Erick Erickson <er...@gmail.com>.
That should work. The caveat here is that you need to get the each
shards index to the corresponding shard on your new collection.

Of course I'd back up _all_ of these indexes before even starting.

And one other trick. First create your collection with 1 replica per
shard (leader-only). Then copy the indexes (and, btw, I'd have the
associated Solr nodes down during the copy) and verify the collection
is as you'd expect.

Now use ADDREPLICA to expand your collection, that'll handle the
copying from the leader correctly.

Best,
Erick

On Mon, Nov 7, 2016 at 12:49 PM, Chetas Joshi <ch...@gmail.com> wrote:
> I have a Solr Cloud deployed on top of HDFS.
>
> I accidentally deleted a collection using the collection API. So, ZooKeeper
> cluster has lost all the info related to that collection. I don't have a
> backup that I can restore from. However, I have indices and transaction
> logs on HDFS.
>
> If I create a new collection and copy the existing data directory to the
> data directory path of the new collection I have created, will I be able to
> go back to the state where I was? Is there anything else I would have to do?
>
> Thanks,
>
> Chetas.