You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joe Lerner <jo...@gmail.com> on 2019/06/05 17:18:04 UTC

Solr Migration to The AWS Cloud

Hi,

Our application is migrating from on-premise to AWS. We are currently on
Solr Cloud 7.3.0.

We are interested in exploring ways to do this with minimal,  down-time, as
in, maybe one hour.

One strategy would be to set up a new empty Solr Cloud instance in AWS, and
reindex the world. But reindexing takes us around ~14 hours, so, that is not
a viable approach.

I think one very attractive option would be to set up a new live
node/replica in AWS, and, once it replicates, we're essentially
done--literally zero down time (for search anyway). But I don't think we're
going to be able to do that from a networking/security perspective.

From what I've seen, the other option is to copy the Solr index files to
AWS, and somehow use them to set up a new pre-indexed instance. Do I need to
shut down my application and Solr on prem before I copy the files, or can I
copy while things are active. 

If I can do the copy while the application is running, I can probably:

1. Copy files to AWS Friday at noon
2. Keep a record of what got re-indexed after Friday at noon (or, heck,
11:45am)
3. Start up the new Solr in AWS against the copied files
4. Reindex the stuff that got re-indexed after Friday at noon

Is there a cleaner/simpler/more official way of moving an index from what
place to another? Export/import, or something like that?

Thanks for any help!

Joe




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr Migration to The AWS Cloud

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/5/2019 11:18 AM, Joe Lerner wrote:
> Our application is migrating from on-premise to AWS. We are currently on
> Solr Cloud 7.3.0.
> 
> We are interested in exploring ways to do this with minimal,  down-time, as
> in, maybe one hour.
> 
> One strategy would be to set up a new empty Solr Cloud instance in AWS, and
> reindex the world. But reindexing takes us around ~14 hours, so, that is not
> a viable approach.

You could go this route by reindexing in AWS and then switching your 
application once the index is ready.

> I think one very attractive option would be to set up a new live
> node/replica in AWS, and, once it replicates, we're essentially
> done--literally zero down time (for search anyway). But I don't think we're
> going to be able to do that from a networking/security perspective.
> 
>  From what I've seen, the other option is to copy the Solr index files to
> AWS, and somehow use them to set up a new pre-indexed instance. Do I need to
> shut down my application and Solr on prem before I copy the files, or can I
> copy while things are active.

If the index is changing, life can get interesting.  If it's not 
changing, then as long as the OS permits it (most of them should) you're 
free to copy while Solr is running.

> If I can do the copy while the application is running, I can probably:
> 
> 1. Copy files to AWS Friday at noon
> 2. Keep a record of what got re-indexed after Friday at noon (or, heck,
> 11:45am)
> 3. Start up the new Solr in AWS against the copied files
> 4. Reindex the stuff that got re-indexed after Friday at noon

If your existing cloud is on an OS where rsync is natively available, it 
should be pretty easy to do what you're trying with very little 
downtime, possibly just long enough to reconfigure and restart your 
applications.

> Is there a cleaner/simpler/more official way of moving an index from what
> place to another? Export/import, or something like that?

The Backup/Restore capability in the Collections API is probably the 
most official you're going to get.

I will write a followup with the way that I would do this.  That's going 
to take me a while.  I might put it on my blog instead and provide a link.

One thing to note:  Lock down your AWS firewall so only trusted 
systems/people can reach your Solr install.  That's the best way you can 
secure things.

Thanks,
Shawn

Re: Solr Migration to The AWS Cloud

Posted by Jörn Franke <jo...@gmail.com>.
I guess you can do this by switching off the source data center, but you would need to look more in your architecture and especially applications that use solr to verify this.

It may look easy but I would test it before.

> Am 06.06.2019 um 17:24 schrieb Joe Lerner <jo...@gmail.com>:
> 
> Ooohh...interesting. Then, presumably there is some way to have what was the
> cross-data-center replica become the new "primary"? 
> 
> It's getting too easy!
> 
> Joe
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr Migration to The AWS Cloud

Posted by Joe Lerner <jo...@gmail.com>.
Ooohh...interesting. Then, presumably there is some way to have what was the
cross-data-center replica become the new "primary"? 

It's getting too easy!

Joe



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr Migration to The AWS Cloud

Posted by Jörn Franke <jo...@gmail.com>.
An alternative to backup and restore could be the data center replication in Solr:

https://lucene.apache.org/solr/guide/7_3/cross-data-center-replication-cdcr.html

> Am 05.06.2019 um 19:18 schrieb Joe Lerner <jo...@gmail.com>:
> 
> Hi,
> 
> Our application is migrating from on-premise to AWS. We are currently on
> Solr Cloud 7.3.0.
> 
> We are interested in exploring ways to do this with minimal,  down-time, as
> in, maybe one hour.
> 
> One strategy would be to set up a new empty Solr Cloud instance in AWS, and
> reindex the world. But reindexing takes us around ~14 hours, so, that is not
> a viable approach.
> 
> I think one very attractive option would be to set up a new live
> node/replica in AWS, and, once it replicates, we're essentially
> done--literally zero down time (for search anyway). But I don't think we're
> going to be able to do that from a networking/security perspective.
> 
> From what I've seen, the other option is to copy the Solr index files to
> AWS, and somehow use them to set up a new pre-indexed instance. Do I need to
> shut down my application and Solr on prem before I copy the files, or can I
> copy while things are active. 
> 
> If I can do the copy while the application is running, I can probably:
> 
> 1. Copy files to AWS Friday at noon
> 2. Keep a record of what got re-indexed after Friday at noon (or, heck,
> 11:45am)
> 3. Start up the new Solr in AWS against the copied files
> 4. Reindex the stuff that got re-indexed after Friday at noon
> 
> Is there a cleaner/simpler/more official way of moving an index from what
> place to another? Export/import, or something like that?
> 
> Thanks for any help!
> 
> Joe
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html