You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by philippa griggs <ph...@hotmail.co.uk> on 2015/12/15 16:57:17 UTC

Collection API migrate statement

Hello,


Solr 5.2.1.


I'm using the collection API migrate statement in our test environment with the view to implement a Hot, Cold arrangement- newer documents will be kept on the Hot collection and each night the oldest documents will be migrated into the Cold collection. I've got it all working with a small amount of documents (around 28,000).


I'm now trying to migrate around 200,000 documents and am getting 'migrate the collection time out:180s'  message back.


The logs from the source collection are:


INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created replica of temp source collection on target leader node
INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp source collection replica to target leader
INFO  - 2015-12-15 14:45:36.648; [   ] org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on path /overseer/collection-queue-work/qnr-0000000004 state SyncConnected
INFO  - 2015-12-15 14:45:36.651; [   ] org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged fired on path /overseer/collection-queue-work state SyncConnected
ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: migrate the collection time out:180s
        at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
        etc


The logs from the target collection are:

INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; Took 22 ms to seed version buckets with highest version 1520634636692094979
INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  split_shard1_temp_shard2_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. core=split_shard1_temp_shard2_shard1_replica2
INFO  - 2015-12-15 14:43:19.199; [   ] org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}

As there are no errors in the target collection, am I right in assuming the timeout occured because the merge took too long? If that is so, how to I increase the timeout period? Ideally I will need to migrate around 2 million documents a night.


Any help would be much appreciated.


Philippa



Re: Collection API migrate statement

Posted by philippa griggs <ph...@hotmail.co.uk>.
Hello,

Thanks for your reply.  

As you suggested, I've tried running the operation along with the async command and it works- thank you. My next question is: Is there any way of finding out more information on the completed task? As I'm currently testing the new solr configuration, it would be handy to know the runtime of the operation.

Many thanks

Philippa

________________________________________
From: Shalin Shekhar Mangar <sh...@gmail.com>
Sent: 15 December 2015 19:05
To: solr-user@lucene.apache.org
Subject: Re: Collection API migrate statement

The migrate is a long running operation. Please use it along with
async=<give_your_own_request_id> parameter so that it can execute in
the background. Then you can use the request status API to poll and
wait until the operation completes. If there is any error then the
same request status API will return the response. See
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RequestStatus

On Tue, Dec 15, 2015 at 9:27 PM, philippa griggs
<ph...@hotmail.co.uk> wrote:
> Hello,
>
>
> Solr 5.2.1.
>
>
> I'm using the collection API migrate statement in our test environment with the view to implement a Hot, Cold arrangement- newer documents will be kept on the Hot collection and each night the oldest documents will be migrated into the Cold collection. I've got it all working with a small amount of documents (around 28,000).
>
>
> I'm now trying to migrate around 200,000 documents and am getting 'migrate the collection time out:180s'  message back.
>
>
> The logs from the source collection are:
>
>
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created replica of temp source collection on target leader node
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp source collection replica to target leader
> INFO  - 2015-12-15 14:45:36.648; [   ] org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on path /overseer/collection-queue-work/qnr-0000000004 state SyncConnected
> INFO  - 2015-12-15 14:45:36.651; [   ] org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged fired on path /overseer/collection-queue-work state SyncConnected
> ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: migrate the collection time out:180s
>         at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
>         etc
>
>
> The logs from the target collection are:
>
> INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; Took 22 ms to seed version buckets with highest version 1520634636692094979
> INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  split_shard1_temp_shard2_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. core=split_shard1_temp_shard2_shard1_replica2
> INFO  - 2015-12-15 14:43:19.199; [   ] org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}
>
> As there are no errors in the target collection, am I right in assuming the timeout occured because the merge took too long? If that is so, how to I increase the timeout period? Ideally I will need to migrate around 2 million documents a night.
>
>
> Any help would be much appreciated.
>
>
> Philippa
>
>



--
Regards,
Shalin Shekhar Mangar.

Re: Collection API migrate statement

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
The migrate is a long running operation. Please use it along with
async=<give_your_own_request_id> parameter so that it can execute in
the background. Then you can use the request status API to poll and
wait until the operation completes. If there is any error then the
same request status API will return the response. See
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RequestStatus

On Tue, Dec 15, 2015 at 9:27 PM, philippa griggs
<ph...@hotmail.co.uk> wrote:
> Hello,
>
>
> Solr 5.2.1.
>
>
> I'm using the collection API migrate statement in our test environment with the view to implement a Hot, Cold arrangement- newer documents will be kept on the Hot collection and each night the oldest documents will be migrated into the Cold collection. I've got it all working with a small amount of documents (around 28,000).
>
>
> I'm now trying to migrate around 200,000 documents and am getting 'migrate the collection time out:180s'  message back.
>
>
> The logs from the source collection are:
>
>
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created replica of temp source collection on target leader node
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp source collection replica to target leader
> INFO  - 2015-12-15 14:45:36.648; [   ] org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on path /overseer/collection-queue-work/qnr-0000000004 state SyncConnected
> INFO  - 2015-12-15 14:45:36.651; [   ] org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged fired on path /overseer/collection-queue-work state SyncConnected
> ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: migrate the collection time out:180s
>         at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
>         etc
>
>
> The logs from the target collection are:
>
> INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; Took 22 ms to seed version buckets with highest version 1520634636692094979
> INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  split_shard1_temp_shard2_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. core=split_shard1_temp_shard2_shard1_replica2
> INFO  - 2015-12-15 14:43:19.199; [   ] org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}
>
> As there are no errors in the target collection, am I right in assuming the timeout occured because the merge took too long? If that is so, how to I increase the timeout period? Ideally I will need to migrate around 2 million documents a night.
>
>
> Any help would be much appreciated.
>
>
> Philippa
>
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Collection API migrate statement

Posted by Erick Erickson <er...@gmail.com>.
You might look at colleciton aliasing, this is sometimes used for
time-series data (which I'm guessing this is).

But I have to ask whether migrating tuf faround like that is really
necessary. 2M docs isn't very many, have you stress tested with just
indexing them all to a single collection? Is the traffic on the Hot
part really heavy enough to warrant the complexity?

Best,
Erick

On Tue, Dec 15, 2015 at 7:57 AM, philippa griggs
<ph...@hotmail.co.uk> wrote:
> Hello,
>
>
> Solr 5.2.1.
>
>
> I'm using the collection API migrate statement in our test environment with the view to implement a Hot, Cold arrangement- newer documents will be kept on the Hot collection and each night the oldest documents will be migrated into the Cold collection. I've got it all working with a small amount of documents (around 28,000).
>
>
> I'm now trying to migrate around 200,000 documents and am getting 'migrate the collection time out:180s'  message back.
>
>
> The logs from the source collection are:
>
>
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created replica of temp source collection on target leader node
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp source collection replica to target leader
> INFO  - 2015-12-15 14:45:36.648; [   ] org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on path /overseer/collection-queue-work/qnr-0000000004 state SyncConnected
> INFO  - 2015-12-15 14:45:36.651; [   ] org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged fired on path /overseer/collection-queue-work state SyncConnected
> ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: migrate the collection time out:180s
>         at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
>         etc
>
>
> The logs from the target collection are:
>
> INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; Took 22 ms to seed version buckets with highest version 1520634636692094979
> INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  split_shard1_temp_shard2_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. core=split_shard1_temp_shard2_shard1_replica2
> INFO  - 2015-12-15 14:43:19.199; [   ] org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}
>
> As there are no errors in the target collection, am I right in assuming the timeout occured because the merge took too long? If that is so, how to I increase the timeout period? Ideally I will need to migrate around 2 million documents a night.
>
>
> Any help would be much appreciated.
>
>
> Philippa
>
>