You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu> on 2021/11/10 19:30:04 UTC

Change in asterixdb[master]: [NO ISSUE][CLUS] Interrupt global recovery on node failure

From Murtadha Hubail <mh...@apache.org>:

Murtadha Hubail has uploaded this change for review. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025 )


Change subject: [NO ISSUE][CLUS] Interrupt global recovery on node failure
......................................................................

[NO ISSUE][CLUS] Interrupt global recovery on node failure

- user model changes: no
- storage format changes: no
- interface changes: no

Details:

- When a node fails while global recovery is on-going, interrupt
  recovery to avoid unnecessary waiting.

Change-Id: I58852e046ff4021f4c5d115f5c3488b249fc61a2
---
M asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java
1 file changed, 12 insertions(+), 1 deletion(-)



  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/25/14025/1

diff --git a/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java b/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java
index 9438b16..e6ef8df 100644
--- a/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java
+++ b/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java
@@ -23,6 +23,7 @@
 import java.util.Collections;
 import java.util.List;
 import java.util.Set;
+import java.util.concurrent.Future;
 import java.util.concurrent.TimeUnit;
 
 import org.apache.asterix.app.message.StorageCleanupRequestMessage;
@@ -64,6 +65,7 @@
     protected final IHyracksClientConnection hcc;
     protected volatile boolean recoveryCompleted;
     protected volatile boolean recovering;
+    protected Future<?> recoveryFuture;
 
     public GlobalRecoveryManager(ICCServiceContext serviceCtx, IHyracksClientConnection hcc,
             IStorageComponentProvider componentProvider) {
@@ -98,7 +100,7 @@
                      * Perform recovery on a different thread to avoid deadlocks in
                      * {@link org.apache.asterix.common.cluster.IClusterStateManager}
                      */
-                    serviceCtx.getControllerService().getExecutor().submit(() -> {
+                    recoveryFuture = serviceCtx.getControllerService().getExecutor().submit(() -> {
                         try {
                             recover(appCtx);
                         } catch (Throwable e) {
@@ -127,6 +129,9 @@
         MetadataManager.INSTANCE.commitTransaction(mdTxnCtx);
         recoveryCompleted = true;
         recovering = false;
+        synchronized (this) {
+            recoveryFuture = null;
+        }
         LOGGER.info("Global Recovery Completed. Refreshing cluster state...");
         appCtx.getClusterStateManager().refreshState();
     }
@@ -166,6 +171,12 @@
 
     @Override
     public void notifyStateChange(ClusterState newState) {
+        synchronized (this) {
+            if (recovering && newState == ClusterState.UNUSABLE && recoveryFuture != null) {
+                // interrupt the recovery attempt since cluster became unusable during global recovery
+                recoveryFuture.cancel(true);
+            }
+        }
         if (newState != ClusterState.ACTIVE && newState != ClusterState.RECOVERING) {
             recoveryCompleted = false;
         }

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I58852e046ff4021f4c5d115f5c3488b249fc61a2
Gerrit-Change-Number: 14025
Gerrit-PatchSet: 1
Gerrit-Owner: Murtadha Hubail <mh...@apache.org>
Gerrit-MessageType: newchange

Change in asterixdb[master]: [NO ISSUE][CLUS] Interrupt global recovery on node failure

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Murtadha Hubail <mh...@apache.org>:

Murtadha Hubail has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025 )

Change subject: [NO ISSUE][CLUS] Interrupt global recovery on node failure
......................................................................


Patch Set 1: Code-Review+1


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I58852e046ff4021f4c5d115f5c3488b249fc61a2
Gerrit-Change-Number: 14025
Gerrit-PatchSet: 1
Gerrit-Owner: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Comment-Date: Thu, 11 Nov 2021 14:50:56 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [NO ISSUE][CLUS] Interrupt global recovery on node failure

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Murtadha Hubail <mh...@apache.org>:

Murtadha Hubail has uploaded this change for review. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025 )


Change subject: [NO ISSUE][CLUS] Interrupt global recovery on node failure
......................................................................

[NO ISSUE][CLUS] Interrupt global recovery on node failure

- user model changes: no
- storage format changes: no
- interface changes: no

Details:

- When a node fails while global recovery is on-going, interrupt
  recovery to avoid unnecessary waiting.

Change-Id: I58852e046ff4021f4c5d115f5c3488b249fc61a2
---
M asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java
1 file changed, 12 insertions(+), 1 deletion(-)



  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/25/14025/1

diff --git a/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java b/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java
index 9438b16..e6ef8df 100644
--- a/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java
+++ b/asterixdb/asterix-app/src/main/java/org/apache/asterix/hyracks/bootstrap/GlobalRecoveryManager.java
@@ -23,6 +23,7 @@
 import java.util.Collections;
 import java.util.List;
 import java.util.Set;
+import java.util.concurrent.Future;
 import java.util.concurrent.TimeUnit;
 
 import org.apache.asterix.app.message.StorageCleanupRequestMessage;
@@ -64,6 +65,7 @@
     protected final IHyracksClientConnection hcc;
     protected volatile boolean recoveryCompleted;
     protected volatile boolean recovering;
+    protected Future<?> recoveryFuture;
 
     public GlobalRecoveryManager(ICCServiceContext serviceCtx, IHyracksClientConnection hcc,
             IStorageComponentProvider componentProvider) {
@@ -98,7 +100,7 @@
                      * Perform recovery on a different thread to avoid deadlocks in
                      * {@link org.apache.asterix.common.cluster.IClusterStateManager}
                      */
-                    serviceCtx.getControllerService().getExecutor().submit(() -> {
+                    recoveryFuture = serviceCtx.getControllerService().getExecutor().submit(() -> {
                         try {
                             recover(appCtx);
                         } catch (Throwable e) {
@@ -127,6 +129,9 @@
         MetadataManager.INSTANCE.commitTransaction(mdTxnCtx);
         recoveryCompleted = true;
         recovering = false;
+        synchronized (this) {
+            recoveryFuture = null;
+        }
         LOGGER.info("Global Recovery Completed. Refreshing cluster state...");
         appCtx.getClusterStateManager().refreshState();
     }
@@ -166,6 +171,12 @@
 
     @Override
     public void notifyStateChange(ClusterState newState) {
+        synchronized (this) {
+            if (recovering && newState == ClusterState.UNUSABLE && recoveryFuture != null) {
+                // interrupt the recovery attempt since cluster became unusable during global recovery
+                recoveryFuture.cancel(true);
+            }
+        }
         if (newState != ClusterState.ACTIVE && newState != ClusterState.RECOVERING) {
             recoveryCompleted = false;
         }

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I58852e046ff4021f4c5d115f5c3488b249fc61a2
Gerrit-Change-Number: 14025
Gerrit-PatchSet: 1
Gerrit-Owner: Murtadha Hubail <mh...@apache.org>
Gerrit-MessageType: newchange

Change in asterixdb[master]: [NO ISSUE][CLUS] Interrupt global recovery on node failure

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Ali Alsuliman <al...@gmail.com>:

Ali Alsuliman has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025 )

Change subject: [NO ISSUE][CLUS] Interrupt global recovery on node failure
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I58852e046ff4021f4c5d115f5c3488b249fc61a2
Gerrit-Change-Number: 14025
Gerrit-PatchSet: 1
Gerrit-Owner: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Ali Alsuliman <al...@gmail.com>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Comment-Date: Thu, 11 Nov 2021 20:53:25 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [NO ISSUE][CLUS] Interrupt global recovery on node failure

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Jenkins <je...@fulliautomatix.ics.uci.edu>:

Jenkins has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025 )

Change subject: [NO ISSUE][CLUS] Interrupt global recovery on node failure
......................................................................


Patch Set 1: Integration-Tests+1

Integration Tests Successful

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/12696/ : SUCCESS


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/14025
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I58852e046ff4021f4c5d115f5c3488b249fc61a2
Gerrit-Change-Number: 14025
Gerrit-PatchSet: 1
Gerrit-Owner: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-CC: Anon. E. Moose #1000171
Gerrit-Comment-Date: Wed, 10 Nov 2021 21:36:38 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment