You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2016/04/04 22:21:25 UTC

[jira] [Updated] (SOLR-6465) CDCR: fall back to whole-index replication when tlogs are insufficient

     [ https://issues.apache.org/jira/browse/SOLR-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-6465:
----------------------------------------
    Attachment: SOLR-6465.patch

This is the first cut for this feature.

The CdcrRequestHandler supports a new asynchronous command called BOOTSTRAP which triggers a full index replication from a given master URL. There is a corresponding BOOTSTRAP_STATUS command which returns whether a bootstrap operation is running or either finished successfully or failed.

The "shardcheckpoint" command has been modified to return the max version across the index and update log using the same updateVersionToHighest logic used to initialize version buckets from tlog+index during startup/reload.

The CdcrReplicatorManager calls collectioncheckpoint to read the max version indexed on the target and then if it finds that there exists a gap in its tlog, asks the target cluster to bootstrap itself from the current shard leader on the source. During this time a flag is set in CdcrReplicatorState such that the CdcrReplicatorScheduler will not send any updates to the target cluster during this time. Once the bootstrap is complete, a collectioncheckpoint is called and the returned version is used to open a regular tlog reader using which normal cdcr replication mechanism takes over.

A new test called CdcrBootstrapTest is added for this feature. There is some additional code in CdcrUpdateLog which allows one to convert an existing cluster with data to be a cdcr source.

There are plenty of nocommits and debug logging in this patch which I will work to resolve/remove in the next patches. I also found a few bugs for which I'll open separate issues.

Open items/todos:
# Now that we can bootstrap target clusters using the index files, we have no need to keep update logs around for a long time. Therefore, we can get rid of CdcrUpdateLog itself and make CDCR work with regular UpdateLog.
# In the same vein, there is no need for replicating tlog files from leader to replicas on the source cluster so "lastprocessedversion", CdcrLogSynchronizer and tlog replication code be purged.
# Hardening is required against the bootstrap process racing with recovery. Normally this won't happen because bootstrap only happens on target shard leaders but if/when the leadership changes, I suspect bootstrap can continue to run for a while and race with bootstrap. I haven't been able to trigger this yet in a test case but I'll continue to work on it.
# In this patch, the bootstrap trigger thread is initiated on state change but if it exits due to a unhandled condition then the replication state is forever in bootstrapping mode and there is no corrective action. Although care has been taken to handle most failures but after implementing this, I feel that it is unnecessarily fragile and we are better off adding some logic in the scheduled replicator component than trying to do bootstrap on init only.
# The existing CDCR tests which test aspects related to tlog replication do not pass currently. Once we yank that code, this would be a non-issue.
# Tests and more tests!

> CDCR: fall back to whole-index replication when tlogs are insufficient
> ----------------------------------------------------------------------
>
>                 Key: SOLR-6465
>                 URL: https://issues.apache.org/jira/browse/SOLR-6465
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-6465.patch
>
>
> When the peer-shard doesn't have transaction logs to forward all the needed updates to bring a peer up to date, we need to fall back to normal replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org