You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Houston Putman (Jira)" <ji...@apache.org> on 2021/08/18 18:17:00 UTC

[jira] [Commented] (SOLR-15585) Graceful shutdown can cause data loss with PULL replicas

    [ https://issues.apache.org/jira/browse/SOLR-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401288#comment-17401288 ] 

Houston Putman commented on SOLR-15585:
---------------------------------------

In this scenario, all TLOG replicas are offline correct? Otherwise another TLOG replica will claim leadership, and the PULL replicas will fetch the up-to-date index from them.

It's kind of hard for me to imagine a scenario where all TLOG replicas are gracefully shut down simultaneously, while the PULL replicas are still expected to have up-to-date data. I imagine if this were to happen, the TLOG replicas would have been forced offline, (OOM, Machine failure, etc), and there wouldn't be a chance to replicate to the PULL replicas anyways.

I think between the two options below, I definitely prefer the second. (unless I am misunderstanding something here)
 * Add shutdown behavior to check that you are the last TLOG replica, and have to replicate the data before shutting down
 * Write advice in the ref guide to always have at least one TLOG replica available when doing cluster operations (and definitely have multiple TLOG replicas existing for each shard)

> Graceful shutdown can cause data loss with PULL replicas
> --------------------------------------------------------
>
>                 Key: SOLR-15585
>                 URL: https://issues.apache.org/jira/browse/SOLR-15585
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Priority: Major
>
> When using TLOG (leader)+PULL replicas, a clean shutdown of the node containing leaders (for which PULL replicas exist on other nodes) can be complete even before PULL replicas get a chance to sync all recently committed segments. One solution can be to have a check and enforce a replication on the node containing the leaders of such shards before shutting down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org