You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ignite.apache.org by sa...@apache.org on 2022/05/19 11:37:14 UTC

[ignite-3] branch main updated: IGNITE-17004 added documentation about a node catching up process.

This is an automated email from the ASF dual-hosted git repository.

sanpwc pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/ignite-3.git


The following commit(s) were added to refs/heads/main by this push:
     new c06b00d3c IGNITE-17004 added documentation about a node catching up process.
c06b00d3c is described below

commit c06b00d3ca87804feaa3e3de45ee4e27a2e14465
Author: Mirza Aliev <al...@gmail.com>
AuthorDate: Thu May 19 14:37:10 2022 +0300

    IGNITE-17004 added documentation about a node catching up process.
---
 modules/raft/tech-notes/nodeCatchUp.md | 67 ++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/modules/raft/tech-notes/nodeCatchUp.md b/modules/raft/tech-notes/nodeCatchUp.md
new file mode 100644
index 000000000..da8d9be7a
--- /dev/null
+++ b/modules/raft/tech-notes/nodeCatchUp.md
@@ -0,0 +1,67 @@
+# Catching up process
+
+## Entry point:
+
+We have a closure named `NodeImpl.OnCaughtUp`, which is responsible for the catching up process for every stale node/replicator on a leader.
+This closure is created every time we call `NodeImpl.ConfigurationCtx#addNewPeers` which happens on a raft configuration change, for example
+when we call `NodeImpl#changePeers`. In `NodeImpl.ConfigurationCtx#addNewPeers` method we assign `OnCaughtUp` closure with a corresponding
+replicator for a stale node. This is done inside `ReplicatorGroupImpl#waitCaughtUp`
+by calling `Replicator#waitForCaughtUp`. To be more precise, we save the closure in a field `Replicator#catchUpClosure` and also we schedule
+timer on a replicator to call `Replicator#onCatchUpTimedOut` (by default it is called after election timeout).
+
+## Closure invocation
+
+So, we saved closure, it is time to discuss what happens when the closure is run. It happens in `Replicator#notifyOnCaughtUp`, we call this
+method with a status of success or failure of a catching up process and propagate it to the closure by setting `Status.setError(int,
+java.lang.String, java.lang.Object...)`. When the closure is run, `NodeImpl#onCaughtUp` is called, this method checks the status of a process
+and here we have several outcomes:
+
+- `Status` is `OK` and `NodeImpl.ConfigurationCtx#onCaughtUp` with a success flag equals true is called, so we can move
+  to `NodeImpl.ConfigurationCtx.nextStage` in a configuration changing process.
+- `Status` is `Error` and more specific, it is timeout, so we retry catching up process by creating a new `NodeImpl.OnCaughtUp` closure
+  calling the same `Replicator#waitForCaughtUp` that we described before.
+- If retrying went wrong or `Status` is `Error` and is not a timeout, we call `NodeImpl.ConfigurationCtx#onCaughtUp` with a success flag
+  equals false, so the whole process of a configuration changing is reset with `RaftError.ECATCHUP` and corresponding
+  `RaftGroupEventsListener#onReconfigurationError` is called. Note that we do not preserve the original reason for failed catch up. Also, it
+  is important, that `RaftGroupEventsListener#onReconfigurationError` can be called, when a current leader stepped down. In that case
+  `ConfigurationCtx#reset(org.apache.ignite.raft.jraft.Status)` with status equals `null` will be called, and this `null` value will be
+  passed to `RaftGroupEventsListener#onReconfigurationError`.
+
+## Where the closure is invoked 
+
+Now let's discuss when this closure is run. As we said before, it happens in `Replicator#notifyOnCaughtUp`, 
+so lets track who call `Replicator#Replicator#notifyOnCaughtUp`
+
+Calls with successful statuses: 
+- `Replicator#onInstallSnapshotReturned` -- called after a successful installation of a snapshot
+- `Replicator#onAppendEntriesReturned` -- called after a successful appending of new entries from a leader
+
+Calls with error statuses:
+
+* `Replicator#onCatchUpTimedOut` with `RaftError.ETIMEDOUT`, called when a timer event happens. As was described before, this timer is started
+  when we call `Replicator#waitForCaughtUp`.
+* `Replicator#onHeartbeatReturned`, `Replicator#onAppendEntriesReturned`, or `Replicator#onTimeoutNowReturned` with `RaftError.EPERM`,
+  called when a follower returns a term higher than a leader's current term. This is a general check for RPC calls where we check terms and
+  decide, should we step down or not.
+* `Replicator#onTimeoutNowReturned` with `RaftError.ESTOP`, called when we passed to the method flag `stopAfterFinish` equals true. It happens
+  when a leader is stepped down and we try to wake up a potential candidate for the optimisation purposes 
+(see [waking up optimisation](#Waking-up-optimisation)), so we call
+  `Replicator#sendTimeoutNowAndStop(this.wakingCandidate, this.options.getElectionTimeoutMs())` on a leader. For more details see
+  `NodeImpl#stepDown`
+* `Replicator#onError` with `RaftError.ESTOP`. This is a general case when some replicator was stopped. For example, it might happen when
+  a leader stepped down, or when a node was shutdown, etc. Let's consider all places where `Replicator#onError` with `RaftError.ESTOP` can happen, to
+  do that we need to trace `Replicator#stop`
+  - `NodeImpl#shutdown(org.apache.ignite.raft.jraft.Closure)` -- node shutdown case 
+  - `ReplicatorGroupImpl#stopAll` -- happens when a leader steps down, including stopping all replicators. See `NodeImpl#stepDown`. 
+  - `ReplicatorGroupImpl#stopReplicator` -- this happens when we call `ConfigurationCtx#reset(org.apache.ignite.raft.jraft.Status)`, 
+  when we successfully or not successfully changed configuration, so we have to start or stop replicators for new peers. 
+  - `ReplicatorGroupImpl#stopAllAndFindTheNextCandidate` -- called when a leader step down, in case we make
+    [waking up optimisation](#Waking-up-optimisation)
+
+  
+## Waking up optimisation
+
+When a leader faces some problem, it makes some optimisation when it steps down, to start a new voting with a new candidate immediately. In
+that case, instead of stopping all replicators as usual, it preserves one replicator for stopping and sends `TimeoutNowRequest` to it. When
+the node receives that request, it elects itself and starts voting. Failed leader chose such a node by searching for the node with the largest 
+log id among peers in the current configuration. For more details see `ReplicatorGroupImpl#stopAllAndFindTheNextCandidate`