You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/07 02:18:34 UTC

[GitHub] [incubator-doris] pengxiangyu commented on a diff in pull request #9424: [fix] fix bug that replica can not be repaired duo to DECOMMISSION state

pengxiangyu commented on code in PR #9424:
URL: https://github.com/apache/incubator-doris/pull/9424#discussion_r867289520


##########
fe/fe-core/src/main/java/org/apache/doris/clone/TabletScheduler.java:
##########
@@ -1569,7 +1574,10 @@ public void handleRunningTablets() {
 
         // 2. release ctx
         timeoutTablets.stream().forEach(t -> {
-            releaseTabletCtx(t, TabletSchedCtx.State.CANCELLED);
+            // Set "resetReplicaState" to true because
+            // the timeout task should also be considered as UNRECOVERABLE,
+            // so need to reset replica state.
+            releaseTabletCtx(t, TabletSchedCtx.State.CANCELLED, true);

Review Comment:
   It is better to add a log here, it is useful for us to find why resetReplicaState is called. This log will be not too many.



##########
fe/fe-core/src/main/java/org/apache/doris/clone/TabletSchedCtx.java:
##########
@@ -1184,4 +1186,24 @@ public int compare(Replica r1, Replica r2) {
             }
         }
     }
+
+    // call this when releaseTabletCtx()
+    public void resetReplicaState() {
+        if (tablet != null) {
+            for (Replica replica : tablet.getReplicas()) {
+                // To address issue: https://github.com/apache/incubator-doris/issues/9422
+                // the DECOMMISSION state is set in TabletScheduler and not persist to meta.
+                // So it is reasonable to reset this state if we failed to scheduler this tablet.
+                // That is, if the TabletScheduler cannot process the tablet, then it should reset
+                // any intermediate state it set during the scheduling process.
+                if (replica.getState() == ReplicaState.DECOMMISSION) {
+                    replica.setState(ReplicaState.NORMAL);
+                    replica.setWatermarkTxnId(-1);
+                    LOG.debug("reset replica {} on backend {} of tablet {} state from DECOMMISSION to NORMAL",

Review Comment:
   LOG.warn() is better,resetReplicaState will not be called frequently, so this log will not be too many, but we have to known which tablet is reset, to find out why it is be like this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org