You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tim (JIRA)" <ji...@apache.org> on 2019/02/11 15:51:00 UTC

[jira] [Comment Edited] (SOLR-13141) replicationFactor param cause problems with CDCR

    [ https://issues.apache.org/jira/browse/SOLR-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765087#comment-16765087 ] 

Tim edited comment on SOLR-13141 at 2/11/19 3:50 PM:
-----------------------------------------------------

Unfortunately I've run into the same issues as well. Have not been able to find a work around. [http://lucene.472066.n3.nabble.com/CDCR-Unable-to-locate-core-td4423181.html#a4423708]

 

In 7.3 a fix was put in that once cdcr (leader to leader replication) had completed the source collection sends a core recovery command to the target collection's follower replicas. However it seems to be sending this to the wrong nodes (hence the unable to locate core error). The command being sent to the node of the leader replica instead of the node of the follower replica.


was (Author: timsolr):
Unfortunately I've run into the same issues as well. Have not been able to find a work around. [http://lucene.472066.n3.nabble.com/CDCR-Unable-to-locate-core-td4423181.html#a4423708]

 

In 7.3 a fix was put in that once cdcr had completed the source collection sends a core recovery command to the follower replicas. However it seems to be sending this to the wrong nodes (hence the unable to locate core error). The command being sent to the node of the leader replica instead of the node of the follower replica.

> replicationFactor param cause problems with CDCR
> ------------------------------------------------
>
>                 Key: SOLR-13141
>                 URL: https://issues.apache.org/jira/browse/SOLR-13141
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>    Affects Versions: 7.5, 7.6
>         Environment: This is system independent problem - exists on windows and linux - reproduced by independent developers
>            Reporter: Krzysztof Watral
>            Priority: Critical
>         Attachments: type 1 - replication wasnt working at all.txt, type 2 - only few documents were being replicated.txt
>
>
> i have encountered some problems with CDCR that are related to the value of {{replicationFactor}} param.
> I ran the solr cloud on two datacenters with 2 nodes on each:
>  * dca:
>  ** dca_node_1
>  ** dca_node_2
>  * dcb
>  ** dcb_node_1
>  ** dcb_node_2
> Then in sequence:
>  * I configured the CDCR on copy of *_default* config set named *_default_cdcr*
>  * I created collection "customer" on both DC from *_default_cdcr* config set with the following parameters:
>  ** {{numShards}} = 2
>  ** {{maxShardsPerNode}} = 2
>  ** {{replicationFactor}} = 2
>  * I disabled cdcr buffer on collections
>  * I ran CDCR on both DC
> CDCR has started without errors in logs. During indexation I have encountered problem [^type 2 - only few documents were being replicated.txt], restart didn't help (documents has not been synchronized between DC )
> Then:
>  * I stopped CDCR on both DC
>  * I stopped all solr nodes
>  * I restarted zookeepers on both DC
>  * I started all solr nodes one by one
>  * few minutes later I stared CDCR on both DC
>  * CDCR has starded with errors (replication between DC is not working) - [^type 1 - replication wasnt working at all.txt]
> {panel}
> I've also discovered that problems appears only in case, when the {{replicationFactor}} parameter is higher than one
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org