You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Kelly, Frank" <fr...@here.com> on 2017/02/10 18:12:08 UTC

Copying SolrCloud collections (Replication? Backup/Restore?)

Hello,

  We have a 100M+ documents across 2 collections and need to reindex the entirety of the Collections as we need to turn on “docValues”:true on a number of fields (see previous emails from this week :-] ).
Unfortunately we have 4 AWS regions each with their own SolrCloud cluster each with its own copy of the entire search index.
So we have to do this reindex 4 times and in each case we have to take down each region as we need to delete the collection. And reindexing takes about 2-3 days.

Is there someway we can reindex in one (offline) region and then use some mechanism - replication? Backup/restore? EBS snapshot? to “copy and paste” a known Solr state from one SolrCloud instance to another.
From that state then we’d just reindex the delta (from when the snapshot was taken to now)

Appreciate any thoughts or ideas or hear how other folks do it,

Thanks!

-Frank

[Description: Macintosh HD:Users:jerchow:Downloads:Asset_Package_01_160721:HERE_Logo_2016:sRGB:PDF:HERE_Logo_2016_POS_sRGB.pdf]



Frank Kelly

Principal Software Engineer



HERE

5 Wayside Rd, Burlington, MA 01803, USA

42° 29' 7" N 71° 11' 32" W

[Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_360.gif]<http://360.here.com/>    [Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Twitter.gif] <https://www.twitter.com/here>    [Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_FB.gif] <https://www.facebook.com/here>     [Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_IN.gif] <https://www.linkedin.com/company/heremaps>     [Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Insta.gif] <https://www.instagram.com/here/>

Re: Copying SolrCloud collections (Replication? Backup/Restore?)

Posted by "Kelly, Frank" <fr...@here.com>.
Thanks Erick for that idea and the fast response


Cheers!

F

On 2/10/17, 1:24 PM, "Erick Erickson" <er...@gmail.com> wrote:

>First, perhaps the slickest way to reindex without as much downtime would
>be to just index to a _new_ collection. Then use "collection aliasing" to
>point incoming requests to the old collection to the new one. True, you do
>need extra hardware....
>
>But that aside, Solr (well Lucene really) indexes are just files. There's
>a
>collection-wide backup restore but check the PDF for your Solr version to
>see if it's available to you.
>
>Beyond that, just copy things around. So here's a process, modify as you
>see fit:
>1> index to your new collection in region 1
>2> in region 2, create a new collection with the same number of shards (no
>followers, leader-only).
>3> with the Solr instances in region 2 down, copy the data dir from your
>servers in region 1 to the corresponding data dir on your severs in region
>2. It is _very_ important that the hash ranges match. If you look at your
>state.json you'll see an entry for each shard like "hash_range
>0x8000000-0xffffffff. The hash range on the source must match exactly the
>hash range on dest in region 2. Double check this as you basically copy
>from collection_shard1_replica1...data(on region 1)/data to
>collection_shard1_replica1...data on region 2.
>4> Once this is done for all shards, bring up Solr on region 2 and verify
>it's as you expect.
>5> Use the Collections API to ADDREPLICA in region 2 to build out your
>collection. the ADDREPLICA will automatically copy the index from the
>leader.
>
>Best,
>Erick
>
>On Fri, Feb 10, 2017 at 10:12 AM, Kelly, Frank <fr...@here.com>
>wrote:
>
>> Hello,
>>
>>   We have a 100M+ documents across 2 collections and need to reindex the
>> entirety of the Collections as we need to turn on ³docValues²:true on a
>> number of fields (see previous emails from this week :-] ).
>> Unfortunately we have 4 AWS regions each with their own SolrCloud
>>cluster
>> each with its own copy of the entire search index.
>> So we have to do this reindex 4 times and in each case we have to take
>> down each region as we need to delete the collection. And reindexing
>>takes
>> about 2-3 days.
>>
>> Is there someway we can reindex in one (offline) region and then use
>>some
>> mechanism - replication? Backup/restore? EBS snapshot? to ³copy and
>>paste²
>> a known Solr state from one SolrCloud instance to another.
>> From that state then we¹d just reindex the delta (from when the snapshot
>> was taken to now)
>>
>> Appreciate any thoughts or ideas or hear how other folks do it,
>>
>> Thanks!
>>
>> -Frank
>>
>> [image: Description: Macintosh
>> 
>>HD:Users:jerchow:Downloads:Asset_Package_01_160721:HERE_Logo_2016:sRGB:PD
>>F:HERE_Logo_2016_POS_sRGB.pdf]
>>
>>
>>
>> *Frank Kelly*
>>
>> *Principal Software Engineer*
>>
>>
>>
>> HERE
>>
>> 5 Wayside Rd, Burlington, MA 01803, USA
>>
>> *42° 29' 7" N 71° 11' 32" W*
>>
>>
>> [image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_360.gif]
>> 
>><https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F360.he
>>re.com%2F&data=01%7C01%7C%7C05ea18ff9173472e95f008d451e22130%7C6d4034cd72
>>254f72b85391feaea64919%7C1&sdata=PXzSNwFL%2FgL2xo4tQ35vCzfIq4eQVr0roL6pzY
>>nbRvg%3D&reserved=0>    [image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_Twitter.gif]
>> 
>><https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.t
>>witter.com%2Fhere&data=01%7C01%7C%7C05ea18ff9173472e95f008d451e22130%7C6d
>>4034cd72254f72b85391feaea64919%7C1&sdata=lV7%2BO0mdqv%2Fj%2Fg05nt7nBwrfHe
>>ED7%2BOir%2B5OOcYByA8%3D&reserved=0>   [image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_FB.gif]
>> 
>><https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.f
>>acebook.com%2Fhere&data=01%7C01%7C%7C05ea18ff9173472e95f008d451e22130%7C6
>>d4034cd72254f72b85391feaea64919%7C1&sdata=1JMzDtPvN5lML9rvnrygoPi5vRwcrup
>>Rlko7oC1bT3w%3D&reserved=0>    [image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_IN.gif]
>> 
>><https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.l
>>inkedin.com%2Fcompany%2Fheremaps&data=01%7C01%7C%7C05ea18ff9173472e95f008
>>d451e22130%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=ySduRBgnY7f%2FDzx
>>0xdBmvq08oOtls5TcYs1G4jWJqFo%3D&reserved=0>    [image: Description:
>> 
>>/Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Ima
>>ges/20160726_HERE_EMail_Signature_Insta.gif]
>> 
>><https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.i
>>nstagram.com%2Fhere%2F&data=01%7C01%7C%7C05ea18ff9173472e95f008d451e22130
>>%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=9tf7axgNV3jq5bYBkFNoRg6Pmwc
>>HXPcgcVsAN%2BBf85A%3D&reserved=0>
>>


Re: Copying SolrCloud collections (Replication? Backup/Restore?)

Posted by Erick Erickson <er...@gmail.com>.
First, perhaps the slickest way to reindex without as much downtime would
be to just index to a _new_ collection. Then use "collection aliasing" to
point incoming requests to the old collection to the new one. True, you do
need extra hardware....

But that aside, Solr (well Lucene really) indexes are just files. There's a
collection-wide backup restore but check the PDF for your Solr version to
see if it's available to you.

Beyond that, just copy things around. So here's a process, modify as you
see fit:
1> index to your new collection in region 1
2> in region 2, create a new collection with the same number of shards (no
followers, leader-only).
3> with the Solr instances in region 2 down, copy the data dir from your
servers in region 1 to the corresponding data dir on your severs in region
2. It is _very_ important that the hash ranges match. If you look at your
state.json you'll see an entry for each shard like "hash_range
0x8000000-0xffffffff. The hash range on the source must match exactly the
hash range on dest in region 2. Double check this as you basically copy
from collection_shard1_replica1...data(on region 1)/data to
collection_shard1_replica1...data on region 2.
4> Once this is done for all shards, bring up Solr on region 2 and verify
it's as you expect.
5> Use the Collections API to ADDREPLICA in region 2 to build out your
collection. the ADDREPLICA will automatically copy the index from the
leader.

Best,
Erick

On Fri, Feb 10, 2017 at 10:12 AM, Kelly, Frank <fr...@here.com> wrote:

> Hello,
>
>   We have a 100M+ documents across 2 collections and need to reindex the
> entirety of the Collections as we need to turn on “docValues”:true on a
> number of fields (see previous emails from this week :-] ).
> Unfortunately we have 4 AWS regions each with their own SolrCloud cluster
> each with its own copy of the entire search index.
> So we have to do this reindex 4 times and in each case we have to take
> down each region as we need to delete the collection. And reindexing takes
> about 2-3 days.
>
> Is there someway we can reindex in one (offline) region and then use some
> mechanism - replication? Backup/restore? EBS snapshot? to “copy and paste”
> a known Solr state from one SolrCloud instance to another.
> From that state then we’d just reindex the delta (from when the snapshot
> was taken to now)
>
> Appreciate any thoughts or ideas or hear how other folks do it,
>
> Thanks!
>
> -Frank
>
> [image: Description: Macintosh
> HD:Users:jerchow:Downloads:Asset_Package_01_160721:HERE_Logo_2016:sRGB:PDF:HERE_Logo_2016_POS_sRGB.pdf]
>
>
>
> *Frank Kelly*
>
> *Principal Software Engineer*
>
>
>
> HERE
>
> 5 Wayside Rd, Burlington, MA 01803, USA
>
> *42° 29' 7" N 71° 11' 32" W*
>
>
> [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_360.gif]
> <http://360.here.com/>    [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Twitter.gif]
> <https://www.twitter.com/here>   [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_FB.gif]
> <https://www.facebook.com/here>    [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_IN.gif]
> <https://www.linkedin.com/company/heremaps>    [image: Description:
> /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Insta.gif]
> <https://www.instagram.com/here/>
>