You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2016/12/26 13:50:58 UTC
[jira] [Comment Edited] (SOLR-9835) Create another replication mode for SolrCloud

    [ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15778369#comment-15778369 ] 

Cao Manh Dat edited comment on SOLR-9835 at 12/26/16 1:50 PM:
--------------------------------------------------------------

[~yonik@apache.org][~yseeley@gmail.com] : Here are scenario for the problem that I encountered today
- an replica ( let's call it rep1 ) is on recovering mode -> its ulog will be on buffering state.
- rep1 receives an update ( contain doc1 ), rep1 will write the update to its tlog without updating ulog.map for real-time-get
- rep1 replay buffered updates, rep1 will write doc1 to its index, and update ulog.map for real-time-get ( but in this case, ulog.map will point doc1 -> position = -1 because we don't write updateCommand with REPLAY flag to tlog )
- client call real-time-get for doc1
- rep1 will always open a real-time-searcher for this case. Because ulog.map for doc 1 return position = -1

I just wonder why we do that currently? Why don't we just write the update to tlog and ulog.map so we don't have to open a new real-time-searcher for this case?



was (Author: caomanhdat):
[~yonik@apache.org][~yseeley@gmail.com] : Here are scenario for the problem that I encountered today
- an replica ( let's call it rep1 ) is on recovering mode -> its ulog will be on buffering state.
- rep1 receives an update ( contain doc1 ), rep1 will write the update to its tlog without updating ulog.map for real-time-get
- rep1 replay buffered updates, rep1 will write doc1 to its index, and update ulog.map for real-time-get ( but in this case, ulog.map will point doc1 -> position = -1 because we don't write updateCommand with REPLAY flag to tlog )
- client call real-time-get for doc1
- rep1 will always open a real-time-searcher for this case

I just wonder why we do that currently? Why don't we just write the update to tlog and ulog.map so we don't have to open a new real-time-searcher for this case?


> Create another replication mode for SolrCloud
> ---------------------------------------------
>
>                 Key: SOLR-9835
>                 URL: https://issues.apache.org/jira/browse/SOLR-9835
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Shalin Shekhar Mangar
>         Attachments: SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which replicas start in same initial state and for each input, the input is distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply the update to IW, other replicas just store the update to UpdateLog (act like replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org