You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Chaushu, Shani" <sh...@intel.com> on 2015/10/26 12:21:08 UTC

copy data between collection

Hi,
Is there an API to copy all the documents from one collection to another collection in the same solr server simply?
I'm using solr cloud 4.10
Thanks,
Shani

---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Re: copy data between collection

Posted by KNitin <ni...@gmail.com>.
Yes that is correct. https://github.com/bloomreach/solrcloud-haft helps
precisely with that. You can clone an entire cluster or selective
collections between clusters. It has only been tested upto solr 4.10

Let me know if you run into issues
Nitin

On Mon, Oct 26, 2015 at 9:46 AM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> The “copy” command in this tool automatically does what Upayavira
> describes, including bringing the replicas up to date. (if any)
> https://github.com/whitepages/solrcloud_manager
>
>
> I’ve been using it as a mechanism for copying a collection into a new
> cluster (different ZK), but it should work within
> a cluster too. The same caveats apply - see the entry in the README.
>
> I’ve also been doing some collection backup/restore stuff that could be
> used to copy a collection within a cluster, (back up your collection, then
> restore into a new collection with a different name) but I only just
> pushed that, and haven’t bundled it into a release yet.
>
> In all cases, you’re responsible for managing the actual collection
> definitions yourself.
>
> An alternative tool I’m aware of is this one:
> https://github.com/bloomreach/solrcloud-haft
>
> This says it’s only tested with Solr 4.6, but I’d think it should work.
> The Solr APIs for replication haven’t changed much. I haven’t used it, but
> it looks like it has some stuff around saving ZK data that could be
> useful, and that’s one thing I haven’t focused on myself yet.
>
>
>
> On 10/26/15, 4:46 AM, "Upayavira" <uv...@odoko.co.uk> wrote:
>
> >Hi Shani,
> >
> >There isn't a SolrCloud way to do it. A proper 'clone this collection'
> >feature would be a very useful thing.
> >
> >However, I have managed to do it, in a way that involves some caveats:
> > * you should only do this on a collection that has no replicas. Add
> > replicas *after* cloning the index
> > * if you must do it on a sharded index, then you will need to do it
> > once for each shard. No guarantees though
> >
> >All SolrCloud nodes are all already enabled as 'replication masters' so
> >that new replicas can pull a full index from the current leader. We're
> >gonna use this feature to pull our index (assuming single shard):
> >
> >http://
> <your-new-node>:8983/solr/<new-collection>_shard1_replica1/replicat
> >ion?command=fetchindex&masterUrl=http://
> <your-old-node>:8983/solr/<old-col
> >lection>_shard1_replica1/replication
> >
> >This basically says to the core behind your new collection: "Go to the
> >core behind the old collection, and pull its entire index".
> >
> >This worked for me. I added a replica afterwards, and the index cloned
> >correctly. However, when I did it against a collection that had a
> >replica already, the replica *didn't* notice, meaning the leader/replica
> >were now out of sync, i.e: Really make sure you do this replication
> >before you add replicas to your new collection.
> >
> >Hope this helps.
> >
> >Upayavira
> >
> >On Mon, Oct 26, 2015, at 11:21 AM, Chaushu, Shani wrote:
> >> Hi,
> >> Is there an API to copy all the documents from one collection to another
> >> collection in the same solr server simply?
> >> I'm using solr cloud 4.10
> >> Thanks,
> >> Shani
> >>
> >> ---------------------------------------------------------------------
> >> Intel Electronics Ltd.
> >>
> >> This e-mail and any attachments may contain confidential material for
> >> the sole use of the intended recipient(s). Any review or distribution
> >> by others is strictly prohibited. If you are not the intended
> >> recipient, please contact the sender and delete all copies.
>
>

Re: copy data between collection

Posted by Jeff Wartes <jw...@whitepages.com>.
The “copy” command in this tool automatically does what Upayavira
describes, including bringing the replicas up to date. (if any)
https://github.com/whitepages/solrcloud_manager


I’ve been using it as a mechanism for copying a collection into a new
cluster (different ZK), but it should work within
a cluster too. The same caveats apply - see the entry in the README.

I’ve also been doing some collection backup/restore stuff that could be
used to copy a collection within a cluster, (back up your collection, then
restore into a new collection with a different name) but I only just
pushed that, and haven’t bundled it into a release yet.

In all cases, you’re responsible for managing the actual collection
definitions yourself.

An alternative tool I’m aware of is this one:
https://github.com/bloomreach/solrcloud-haft

This says it’s only tested with Solr 4.6, but I’d think it should work.
The Solr APIs for replication haven’t changed much. I haven’t used it, but
it looks like it has some stuff around saving ZK data that could be
useful, and that’s one thing I haven’t focused on myself yet.



On 10/26/15, 4:46 AM, "Upayavira" <uv...@odoko.co.uk> wrote:

>Hi Shani,
>
>There isn't a SolrCloud way to do it. A proper 'clone this collection'
>feature would be a very useful thing.
>
>However, I have managed to do it, in a way that involves some caveats:
> * you should only do this on a collection that has no replicas. Add
> replicas *after* cloning the index
> * if you must do it on a sharded index, then you will need to do it
> once for each shard. No guarantees though
>
>All SolrCloud nodes are all already enabled as 'replication masters' so
>that new replicas can pull a full index from the current leader. We're
>gonna use this feature to pull our index (assuming single shard):
>
>http://<your-new-node>:8983/solr/<new-collection>_shard1_replica1/replicat
>ion?command=fetchindex&masterUrl=http://<your-old-node>:8983/solr/<old-col
>lection>_shard1_replica1/replication
>
>This basically says to the core behind your new collection: "Go to the
>core behind the old collection, and pull its entire index".
>
>This worked for me. I added a replica afterwards, and the index cloned
>correctly. However, when I did it against a collection that had a
>replica already, the replica *didn't* notice, meaning the leader/replica
>were now out of sync, i.e: Really make sure you do this replication
>before you add replicas to your new collection.
>
>Hope this helps.
>
>Upayavira
>
>On Mon, Oct 26, 2015, at 11:21 AM, Chaushu, Shani wrote:
>> Hi,
>> Is there an API to copy all the documents from one collection to another
>> collection in the same solr server simply?
>> I'm using solr cloud 4.10
>> Thanks,
>> Shani
>> 
>> ---------------------------------------------------------------------
>> Intel Electronics Ltd.
>> 
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.


Re: copy data between collection

Posted by Upayavira <uv...@odoko.co.uk>.
Hi Shani,

There isn't a SolrCloud way to do it. A proper 'clone this collection'
feature would be a very useful thing.

However, I have managed to do it, in a way that involves some caveats:
 * you should only do this on a collection that has no replicas. Add
 replicas *after* cloning the index
 * if you must do it on a sharded index, then you will need to do it
 once for each shard. No guarantees though

All SolrCloud nodes are all already enabled as 'replication masters' so
that new replicas can pull a full index from the current leader. We're
gonna use this feature to pull our index (assuming single shard):

http://<your-new-node>:8983/solr/<new-collection>_shard1_replica1/replication?command=fetchindex&masterUrl=http://<your-old-node>:8983/solr/<old-collection>_shard1_replica1/replication

This basically says to the core behind your new collection: "Go to the
core behind the old collection, and pull its entire index".

This worked for me. I added a replica afterwards, and the index cloned
correctly. However, when I did it against a collection that had a
replica already, the replica *didn't* notice, meaning the leader/replica
were now out of sync, i.e: Really make sure you do this replication
before you add replicas to your new collection.

Hope this helps.

Upayavira

On Mon, Oct 26, 2015, at 11:21 AM, Chaushu, Shani wrote:
> Hi,
> Is there an API to copy all the documents from one collection to another
> collection in the same solr server simply?
> I'm using solr cloud 4.10
> Thanks,
> Shani
> 
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.