You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Bryant <da...@tai-dev.co.uk> on 2014/02/17 16:32:20 UTC

Best way to copy data from SolrCloud to standalone Solr?

Hi all,

I have a production SolrCloud server which has multiple sharded indexes, 
and I need to copy all of the indexes to a (non-cloud) Solr server 
within our QA environment.

Can I ask for advice on the best way to do this please?

I've searched the web and found solr2solr 
(https://github.com/dbashford/solr2solr), but the author states that 
this is best for small indexes, and ours are rather large at ~20Gb each. 
I've also looked at replication, but can't find a definite reference on 
how this should be done between SolrCloud and Solr?

Any guidance is very much appreciated.

Best wishes,

Daniel



-- 
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bryant@tai-dev.co.uk <ma...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>

Re: Best way to copy data from SolrCloud to standalone Solr?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
There's a related issue: SOLR-5340 - Add support for named snapshots.
I think we'd want this in SolrCloud soon.

https://issues.apache.org/jira/browse/SOLR-5340

On Tue, Feb 18, 2014 at 7:23 PM, Daniel Bryant
<da...@tai-dev.co.uk> wrote:
> Hi Shawn, Michael,
>
> Many thanks for your responses - we're going to try the replication/backup
> command, as we're thinking this is a 'two bird with one stone' approach
> which will not only allow us to copy the indexes, but also help with backups
> in SolrCloud as well.
>
> Thanks again to you both!
>
> Best wishes,
>
> Daniel
>
>
>
>
> On 17/02/2014 20:25, Michael Della Bitta wrote:
>>
>> I do know for certain that the backup command on a cloud core still works.
>> We have a script like this running on a cron to snapshot indexes:
>>
>> curl -s '
>>
>> http://localhost:8080/solr/#{core}/replication?command=backup&numberToKeep=4&location=/tmp
>> '
>>
>> (not really using /tmp for this, parameters changed to protect the guilty)
>>
>> The admin handler for replication doesn't seem to be there, but the actual
>> API seems to work normally.
>>
>> Michael Della Bitta
>>
>> Applications Developer
>>
>> o: +1 646 532 3062
>>
>> appinions inc.
>>
>> "The Science of Influence Marketing"
>>
>> 18 East 41st Street
>>
>> New York, NY 10017
>>
>> t: @appinions <https://twitter.com/Appinions> | g+:
>>
>> plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
>> w: appinions.com <http://www.appinions.com/>
>>
>>
>> On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey <so...@elyograg.org> wrote:
>>
>>> On 2/17/2014 8:32 AM, Daniel Bryant wrote:
>>>>
>>>> I have a production SolrCloud server which has multiple sharded indexes,
>>>> and I need to copy all of the indexes to a (non-cloud) Solr server
>>>> within our QA environment.
>>>>
>>>> Can I ask for advice on the best way to do this please?
>>>>
>>>> I've searched the web and found solr2solr
>>>> (https://github.com/dbashford/solr2solr), but the author states that
>>>> this is best for small indexes, and ours are rather large at ~20Gb each.
>>>> I've also looked at replication, but can't find a definite reference on
>>>> how this should be done between SolrCloud and Solr?
>>>>
>>>> Any guidance is very much appreciated.
>>>
>>> If the master index isn't changing at the time of the copy, and you're
>>> on a non-Windows platform, you should be able to copy the index
>>> directory directly.  On a Windows platform, whether you can copy the
>>> index while Solr is using it would depend on how Solr/Lucene opens the
>>> files.  A typical Windows file open will prevent anything else from
>>> opening them, and I do not know whether Lucene is smarter than that.
>>>
>>> SolrCloud requires the replication handler to be enabled on all configs,
>>> but during normal operation, it does not actually use replication.  This
>>> is a confusing thing for some users.
>>>
>>> I *think* you can configure the replication handler on slave cores with
>>> a non-cloud config that point at the master cores, and it should
>>> replicate the main Lucene index, but not the config files.  I have no
>>> idea whether things will work right if you configure other master
>>> options like replicateAfter and config files, and I also don't know if
>>> those options might cause problems for SolrCloud itself.  Those options
>>> shouldn't be necessary for just getting the data into a dev environment,
>>> though.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>
> --
> *Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
> <http://www.tai-dev.co.uk/>*
> daniel.bryant@tai-dev.co.uk <ma...@tai-dev.co.uk>  |  +44 (0)
> 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Best way to copy data from SolrCloud to standalone Solr?

Posted by Daniel Bryant <da...@tai-dev.co.uk>.
Hi Shawn, Michael,

Many thanks for your responses - we're going to try the 
replication/backup command, as we're thinking this is a 'two bird with 
one stone' approach which will not only allow us to copy the indexes, 
but also help with backups in SolrCloud as well.

Thanks again to you both!

Best wishes,

Daniel



On 17/02/2014 20:25, Michael Della Bitta wrote:
> I do know for certain that the backup command on a cloud core still works.
> We have a script like this running on a cron to snapshot indexes:
>
> curl -s '
> http://localhost:8080/solr/#{core}/replication?command=backup&numberToKeep=4&location=/tmp
> '
>
> (not really using /tmp for this, parameters changed to protect the guilty)
>
> The admin handler for replication doesn't seem to be there, but the actual
> API seems to work normally.
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> "The Science of Influence Marketing"
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
> w: appinions.com <http://www.appinions.com/>
>
>
> On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey <so...@elyograg.org> wrote:
>
>> On 2/17/2014 8:32 AM, Daniel Bryant wrote:
>>> I have a production SolrCloud server which has multiple sharded indexes,
>>> and I need to copy all of the indexes to a (non-cloud) Solr server
>>> within our QA environment.
>>>
>>> Can I ask for advice on the best way to do this please?
>>>
>>> I've searched the web and found solr2solr
>>> (https://github.com/dbashford/solr2solr), but the author states that
>>> this is best for small indexes, and ours are rather large at ~20Gb each.
>>> I've also looked at replication, but can't find a definite reference on
>>> how this should be done between SolrCloud and Solr?
>>>
>>> Any guidance is very much appreciated.
>> If the master index isn't changing at the time of the copy, and you're
>> on a non-Windows platform, you should be able to copy the index
>> directory directly.  On a Windows platform, whether you can copy the
>> index while Solr is using it would depend on how Solr/Lucene opens the
>> files.  A typical Windows file open will prevent anything else from
>> opening them, and I do not know whether Lucene is smarter than that.
>>
>> SolrCloud requires the replication handler to be enabled on all configs,
>> but during normal operation, it does not actually use replication.  This
>> is a confusing thing for some users.
>>
>> I *think* you can configure the replication handler on slave cores with
>> a non-cloud config that point at the master cores, and it should
>> replicate the main Lucene index, but not the config files.  I have no
>> idea whether things will work right if you configure other master
>> options like replicateAfter and config files, and I also don't know if
>> those options might cause problems for SolrCloud itself.  Those options
>> shouldn't be necessary for just getting the data into a dev environment,
>> though.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
<http://www.tai-dev.co.uk/>*
daniel.bryant@tai-dev.co.uk <ma...@tai-dev.co.uk>  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk <https://twitter.com/taidevcouk>

Re: Best way to copy data from SolrCloud to standalone Solr?

Posted by Michael Della Bitta <mi...@appinions.com>.
I do know for certain that the backup command on a cloud core still works.
We have a script like this running on a cron to snapshot indexes:

curl -s '
http://localhost:8080/solr/#{core}/replication?command=backup&numberToKeep=4&location=/tmp
'

(not really using /tmp for this, parameters changed to protect the guilty)

The admin handler for replication doesn't seem to be there, but the actual
API seems to work normally.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 2/17/2014 8:32 AM, Daniel Bryant wrote:
> > I have a production SolrCloud server which has multiple sharded indexes,
> > and I need to copy all of the indexes to a (non-cloud) Solr server
> > within our QA environment.
> >
> > Can I ask for advice on the best way to do this please?
> >
> > I've searched the web and found solr2solr
> > (https://github.com/dbashford/solr2solr), but the author states that
> > this is best for small indexes, and ours are rather large at ~20Gb each.
> > I've also looked at replication, but can't find a definite reference on
> > how this should be done between SolrCloud and Solr?
> >
> > Any guidance is very much appreciated.
>
> If the master index isn't changing at the time of the copy, and you're
> on a non-Windows platform, you should be able to copy the index
> directory directly.  On a Windows platform, whether you can copy the
> index while Solr is using it would depend on how Solr/Lucene opens the
> files.  A typical Windows file open will prevent anything else from
> opening them, and I do not know whether Lucene is smarter than that.
>
> SolrCloud requires the replication handler to be enabled on all configs,
> but during normal operation, it does not actually use replication.  This
> is a confusing thing for some users.
>
> I *think* you can configure the replication handler on slave cores with
> a non-cloud config that point at the master cores, and it should
> replicate the main Lucene index, but not the config files.  I have no
> idea whether things will work right if you configure other master
> options like replicateAfter and config files, and I also don't know if
> those options might cause problems for SolrCloud itself.  Those options
> shouldn't be necessary for just getting the data into a dev environment,
> though.
>
> Thanks,
> Shawn
>
>

Re: Best way to copy data from SolrCloud to standalone Solr?

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/17/2014 8:32 AM, Daniel Bryant wrote:
> I have a production SolrCloud server which has multiple sharded indexes,
> and I need to copy all of the indexes to a (non-cloud) Solr server
> within our QA environment.
> 
> Can I ask for advice on the best way to do this please?
> 
> I've searched the web and found solr2solr
> (https://github.com/dbashford/solr2solr), but the author states that
> this is best for small indexes, and ours are rather large at ~20Gb each.
> I've also looked at replication, but can't find a definite reference on
> how this should be done between SolrCloud and Solr?
> 
> Any guidance is very much appreciated.

If the master index isn't changing at the time of the copy, and you're
on a non-Windows platform, you should be able to copy the index
directory directly.  On a Windows platform, whether you can copy the
index while Solr is using it would depend on how Solr/Lucene opens the
files.  A typical Windows file open will prevent anything else from
opening them, and I do not know whether Lucene is smarter than that.

SolrCloud requires the replication handler to be enabled on all configs,
but during normal operation, it does not actually use replication.  This
is a confusing thing for some users.

I *think* you can configure the replication handler on slave cores with
a non-cloud config that point at the master cores, and it should
replicate the main Lucene index, but not the config files.  I have no
idea whether things will work right if you configure other master
options like replicateAfter and config files, and I also don't know if
those options might cause problems for SolrCloud itself.  Those options
shouldn't be necessary for just getting the data into a dev environment,
though.

Thanks,
Shawn