You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Wei <we...@gmail.com> on 2017/08/26 01:53:02 UTC

Correct approach to copy index between solr clouds?

Hi,

In our set up there are two solr clouds:

Cloud A:  production cloud serves both writes and reads

Cloud B:  back up cloud serves only writes

Cloud A and B have the same shard configuration.

Write requests are sent to both cloud A and B. In certain circumstances
when Cloud A's update lags behind,  we want to bulk copy the binary index
from B to A.

We have tried two approaches:

Approach 1.
      For cloud A:
      a. delete collection to wipe out everything
      b. create new collection (data is empty now)
      c. shut down solr server
      d. copy binary index from cloud B to corresponding shard replicas in
cloud A
      e. start solr server

Approach 2.
      For cloud A:
      a.  shut down solr server
      b.  remove the whole 'data' folder under index/  in each replica
      c.  copy binary index from cloud B to corresponding shard replicas in
cloud A
      d.  start solr server

Is approach 2 sufficient?  I am wondering if delete/recreate collection
each time is necessary to get cloud into a "clean" state for copy binary
index between solr clouds.

Thanks for your advice!

Re: Correct approach to copy index between solr clouds?

Posted by Erick Erickson <er...@gmail.com>.

write.lock is used whenever a core(replica) wants to, well, write to
the index. Each individual replica is sure to only write to the index
with one thread. If two threads were to write to an index, there's a
very good chance the index will be corrupt, so it's a safeguard
against two or more threads or processes writing to the same index at
the same time.

Since a dataDir can be pointed at an arbitrary directory, not only
could two replicas point to the same index within the same Solr JVM,
but you could have some completely different JVM, possibly even on a
completely different machine point at the _same_ directory (this
latter with any kind of shared filesystem).

In the default case, Java's FileChannel.tryLock(); is used to acquire
an exclusive lock. If two or more threads in the same JVM or two or
more processes point to the same write.lock file one of the replicas
will fail to open.

So I mis-spoke. Just copying the write.lock file from one place to
another along with all the rest of the index files should be OK. Since
it's a new file in a new place, FileChannel.tryLock() can succeed.

You still should be sure that the indexing is stopped on the source
and a hard commit has been performed though. If you just copy from one
to another while indexing is actively happening you might get a
mismatched segments file.

This last might need a bit of explanation. During normal indexing, new
segment(s) are written to. On hard commit (or when background merging
happens) once all the new segment(s) are successfully closed, the
segments file is updated with a list of all of them. This, by the way,
is how an indexSearcher has a "snapshot" of the directory as of the
last commit; it reads the current segments file and opens a all the
segments.

Anyway, theoretically if you just copy the current index directory
while indexing is going on, you could potentially have a mismatch
between the truly closed segments and what has been written the
segments file. This would be avoided by using fetchIndex since that's
been hardened to handle this case, but being sure indexing is stopped
would serve as well.

Best,
Erick

On Sat, Aug 26, 2017 at 6:36 PM, Wei <we...@gmail.com> wrote:
> Thanks Erick. Can you explain a bit more on the write.lock file? So far I
> have been copying it over from B to A and haven't seen issue starting the
> replica.
>
> On Sat, Aug 26, 2017 at 9:25 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Approach 2 is sufficient. You do have to insure that you don't copy
>> over the write.lock file however as you may not be able to start
>> replicas if that's there.
>>
>> There's a relatively little-known third option. You an (ab)use the
>> replication API "fetchindex" command, see:
>> https://cwiki.apache.org/confluence/display/solr/Index+Replication to
>> pull the index from Cloud B to replicas on Cloud A. That has the
>> advantage of working even if you are actively indexing to Cloud B.
>> NOTE: currently you cannot _query_ CloudA (the target) while the
>> fetchindex is going on, but I doubt you really care since you were
>> talking about having Cloud A offline anyway. So for each replica you
>> fetch to you'll send the fetchindex command directly to the replica on
>> Cloud A and the "masterURL" will be the corresponding replica on Cloud
>> B.
>>
>> Finally, what I'd really do is _only_ have one replica for each shard
>> on Cloud A active and fetch to _that_ replica. I'd also delete the
>> data dir on all the other replicas for the shard on Cloud A. Then as
>> you bring the additional replicas up they'll do a full synch from the
>> leader.
>>
>> FWIW,
>> Erick
>>
>> On Fri, Aug 25, 2017 at 6:53 PM, Wei <we...@gmail.com> wrote:
>> > Hi,
>> >
>> > In our set up there are two solr clouds:
>> >
>> > Cloud A:  production cloud serves both writes and reads
>> >
>> > Cloud B:  back up cloud serves only writes
>> >
>> > Cloud A and B have the same shard configuration.
>> >
>> > Write requests are sent to both cloud A and B. In certain circumstances
>> > when Cloud A's update lags behind,  we want to bulk copy the binary index
>> > from B to A.
>> >
>> > We have tried two approaches:
>> >
>> > Approach 1.
>> >       For cloud A:
>> >       a. delete collection to wipe out everything
>> >       b. create new collection (data is empty now)
>> >       c. shut down solr server
>> >       d. copy binary index from cloud B to corresponding shard replicas
>> in
>> > cloud A
>> >       e. start solr server
>> >
>> > Approach 2.
>> >       For cloud A:
>> >       a.  shut down solr server
>> >       b.  remove the whole 'data' folder under index/  in each replica
>> >       c.  copy binary index from cloud B to corresponding shard replicas
>> in
>> > cloud A
>> >       d.  start solr server
>> >
>> > Is approach 2 sufficient?  I am wondering if delete/recreate collection
>> > each time is necessary to get cloud into a "clean" state for copy binary
>> > index between solr clouds.
>> >
>> > Thanks for your advice!
>>

Re: Correct approach to copy index between solr clouds?

Posted by Wei <we...@gmail.com>.

Thanks Erick. Can you explain a bit more on the write.lock file? So far I
have been copying it over from B to A and haven't seen issue starting the
replica.

On Sat, Aug 26, 2017 at 9:25 AM, Erick Erickson <er...@gmail.com>
wrote:

> Approach 2 is sufficient. You do have to insure that you don't copy
> over the write.lock file however as you may not be able to start
> replicas if that's there.
>
> There's a relatively little-known third option. You an (ab)use the
> replication API "fetchindex" command, see:
> https://cwiki.apache.org/confluence/display/solr/Index+Replication to
> pull the index from Cloud B to replicas on Cloud A. That has the
> advantage of working even if you are actively indexing to Cloud B.
> NOTE: currently you cannot _query_ CloudA (the target) while the
> fetchindex is going on, but I doubt you really care since you were
> talking about having Cloud A offline anyway. So for each replica you
> fetch to you'll send the fetchindex command directly to the replica on
> Cloud A and the "masterURL" will be the corresponding replica on Cloud
> B.
>
> Finally, what I'd really do is _only_ have one replica for each shard
> on Cloud A active and fetch to _that_ replica. I'd also delete the
> data dir on all the other replicas for the shard on Cloud A. Then as
> you bring the additional replicas up they'll do a full synch from the
> leader.
>
> FWIW,
> Erick
>
> On Fri, Aug 25, 2017 at 6:53 PM, Wei <we...@gmail.com> wrote:
> > Hi,
> >
> > In our set up there are two solr clouds:
> >
> > Cloud A:  production cloud serves both writes and reads
> >
> > Cloud B:  back up cloud serves only writes
> >
> > Cloud A and B have the same shard configuration.
> >
> > Write requests are sent to both cloud A and B. In certain circumstances
> > when Cloud A's update lags behind,  we want to bulk copy the binary index
> > from B to A.
> >
> > We have tried two approaches:
> >
> > Approach 1.
> >       For cloud A:
> >       a. delete collection to wipe out everything
> >       b. create new collection (data is empty now)
> >       c. shut down solr server
> >       d. copy binary index from cloud B to corresponding shard replicas
> in
> > cloud A
> >       e. start solr server
> >
> > Approach 2.
> >       For cloud A:
> >       a.  shut down solr server
> >       b.  remove the whole 'data' folder under index/  in each replica
> >       c.  copy binary index from cloud B to corresponding shard replicas
> in
> > cloud A
> >       d.  start solr server
> >
> > Is approach 2 sufficient?  I am wondering if delete/recreate collection
> > each time is necessary to get cloud into a "clean" state for copy binary
> > index between solr clouds.
> >
> > Thanks for your advice!
>

Re: Correct approach to copy index between solr clouds?

Posted by Erick Erickson <er...@gmail.com>.

Approach 2 is sufficient. You do have to insure that you don't copy
over the write.lock file however as you may not be able to start
replicas if that's there.

There's a relatively little-known third option. You an (ab)use the
replication API "fetchindex" command, see:
https://cwiki.apache.org/confluence/display/solr/Index+Replication to
pull the index from Cloud B to replicas on Cloud A. That has the
advantage of working even if you are actively indexing to Cloud B.
NOTE: currently you cannot _query_ CloudA (the target) while the
fetchindex is going on, but I doubt you really care since you were
talking about having Cloud A offline anyway. So for each replica you
fetch to you'll send the fetchindex command directly to the replica on
Cloud A and the "masterURL" will be the corresponding replica on Cloud
B.

Finally, what I'd really do is _only_ have one replica for each shard
on Cloud A active and fetch to _that_ replica. I'd also delete the
data dir on all the other replicas for the shard on Cloud A. Then as
you bring the additional replicas up they'll do a full synch from the
leader.

FWIW,
Erick

On Fri, Aug 25, 2017 at 6:53 PM, Wei <we...@gmail.com> wrote:
> Hi,
>
> In our set up there are two solr clouds:
>
> Cloud A:  production cloud serves both writes and reads
>
> Cloud B:  back up cloud serves only writes
>
> Cloud A and B have the same shard configuration.
>
> Write requests are sent to both cloud A and B. In certain circumstances
> when Cloud A's update lags behind,  we want to bulk copy the binary index
> from B to A.
>
> We have tried two approaches:
>
> Approach 1.
>       For cloud A:
>       a. delete collection to wipe out everything
>       b. create new collection (data is empty now)
>       c. shut down solr server
>       d. copy binary index from cloud B to corresponding shard replicas in
> cloud A
>       e. start solr server
>
> Approach 2.
>       For cloud A:
>       a.  shut down solr server
>       b.  remove the whole 'data' folder under index/  in each replica
>       c.  copy binary index from cloud B to corresponding shard replicas in
> cloud A
>       d.  start solr server
>
> Is approach 2 sufficient?  I am wondering if delete/recreate collection
> each time is necessary to get cloud into a "clean" state for copy binary
> index between solr clouds.
>
> Thanks for your advice!