You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/05/11 12:02:02 UTC

[GitHub] [ozone] ChenSammi opened a new pull request, #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

ChenSammi opened a new pull request, #3403:
URL: https://github.com/apache/ozone/pull/3403

   https://issues.apache.org/jira/browse/HDDS-6732


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
errose28 commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r872720504


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   Are there other logs indicating that the applyTransaction was running at the same time as, before, or after the snapshot install? Other hypothetical possibilities include:
   - Snapshot install finished but the sequence ID generator had stale RocksDB handles when applyTransaction ran.
   - RocksDB was closed too early and an applyTransaction operation had not yet finished.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r870898498


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SequenceIdGenerator.java:
##########
@@ -79,7 +79,8 @@ static class Batch {
 
   private final Lock lock;
   private final long batchSize;
-  private final StateManager stateManager;
+  private StateManager stateManager;
+  private final SCMHAManager scmhaManager;

Review Comment:
   Right, it can be removed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r874497725


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   @ChenSammi , this is great!  Thanks for checking it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r870894580


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   Ratis will call StateMachine#pause before snapshot install, and then call StateMachine#reinitialize after statemachine reloaded.  In SCM StateMachine#reinitialize, the statemachine will turn itself into RUNNING state.  All this happens concurrently with other Ratis appendEntities request handling.  So we need to check the statemachine state in applyTransaction.  Refer to the crash file attached. 
   
   In OM statemachine implementation, we don't check this statemachine state in applyTransaction. Because all transaction output data go into OMDoubleBuffer, and we stop OMDoubleBuffer before we reload DB, and restart OMDoubleBuffer after that. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on PR #3403:
URL: https://github.com/apache/ozone/pull/3403#issuecomment-1396646584

   Following two JIRAs solved the issue. 
   HDDS-5710. initialize sequenceIdToLastIdMap when SequenceIdGenerator#StateManager reinitializes (#2611)
   HDDS-5281. Add reinitialize() for SequenceIdGenerator. (#2292) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
errose28 commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r870852667


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   Is this necessary? Shouldn't Ratis handle making sure we are not applying a transaction while a snapshot is being installed?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SequenceIdGenerator.java:
##########
@@ -79,7 +79,8 @@ static class Batch {
 
   private final Lock lock;
   private final long batchSize;
-  private final StateManager stateManager;
+  private StateManager stateManager;
+  private final SCMHAManager scmhaManager;

Review Comment:
   This looks unused as a field, can we keep it local to the constructor?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SequenceIdGenerator.java:
##########
@@ -79,7 +79,8 @@ static class Batch {
 
   private final Lock lock;
   private final long batchSize;
-  private final StateManager stateManager;
+  private StateManager stateManager;
+  private final SCMHAManager scmhaManager;

Review Comment:
   This looks unused as a field, can we keep it local to the constructor?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r871948640


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   @szetszwo , sorry, there is a misleading message in my previous comment.  Follower is crashed when Ratis call statemachine#applyTransaction through StateMachineUpdater, not the appendEntities request handling,  while SCM statemachine is installing snapshot. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r871967410


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   @ChenSammi , for a follower, applyTransaction is called only if there are appendEntities.  So, we need to understand why this is happening.
   
   BTW, we cannot change applyTransaction to ignore the log.  Otherwise, the states among the servers will diverge silently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r874300405


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   With ratis master branch code,  TestStorageContainerManagerHA#testAllSCMAreRunning doesn't crash now.  Will try to add more unit tests to verify whether SCM is safe during install snapshot. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] github-actions[bot] closed pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.
URL: https://github.com/apache/ozone/pull/3403


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] szetszwo commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r871178905


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   @ChenSammi , SCM implements notifyInstallSnapshotFromLeader(..).  When the leader does not have the log entries, it sends a install snapshot notification to the follower.  Therefore, it cannot have any appendEntries calls during snapshot installation.  Are we sure the follower is installing a snapshot when it crashes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3403:
URL: https://github.com/apache/ozone/pull/3403#issuecomment-1194341299

   /pending Will try to add more unit tests to verify whether SCM is safe during install snapshot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
kerneltime commented on PR #3403:
URL: https://github.com/apache/ozone/pull/3403#issuecomment-1379592782

   This bug seems to be still open and impacts 1.3 CC @ChenSammi @adoroszlai @errose28 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi closed pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi closed pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.
URL: https://github.com/apache/ozone/pull/3403


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r870894580


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   Ratis will call StateMachine#pause before snapshot install, and then call StateMachine#reinitialize after statemachine reloaded.  In SCM StateMachine#reinitialize, the statemachine will turn itself into RUNNING state.  All this happens concurrently with other Ratis appendEntities request handling.  So we need to check the statemachine state in applyTransaction.  Refer to following crash stack. 
   
   ![image](https://user-images.githubusercontent.com/19790142/167980105-4dd9edc5-f305-478c-9025-a0267d8fd6bd.png)
   
   
   In OM statemachine implementation, we don't check this statemachine state in applyTransaction. Because all transaction output data go into OMDoubleBuffer, and we stop OMDoubleBuffer before we reload DB, and restart OMDoubleBuffer after that. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r870894580


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   Ratis will call StateMachine#pause before snapshot install, and then call StateMachine#reinitialize after statemachine reloaded.  In SCM StateMachine#reinitialize, the statemachine will turn itself into RUNNING state.  All this happens concurrently with Ratis StateMachineUpdater which call ApplyTransaction.  So we need to check the statemachine state in applyTransaction.  Refer to following crash stack. 
   
   ![image](https://user-images.githubusercontent.com/19790142/167980105-4dd9edc5-f305-478c-9025-a0267d8fd6bd.png)
   
   
   In OM statemachine implementation, we don't check this statemachine state in applyTransaction. Because all transaction output data go into OMDoubleBuffer, and we stop OMDoubleBuffer before we reload DB, and restart OMDoubleBuffer after that. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r871948640


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   @szetszwo , there is a misleading message in my previous comment.  Follower is crashed when Ratis call statemachine#applyTransaction through StateMachineUpdater, while SCM statemachine is installing snapshot, not the appendEntities request handling. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on a diff in pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on code in PR #3403:
URL: https://github.com/apache/ozone/pull/3403#discussion_r871948640


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java:
##########
@@ -133,6 +133,15 @@ public CompletableFuture<Message> applyTransaction(
       final TransactionContext trx) {
     final CompletableFuture<Message> applyTransactionFuture =
         new CompletableFuture<>();
+    if (getLifeCycleState().isPausingOrPaused()) {
+      // Statemachine is installation snapshot. Discard any request.

Review Comment:
   @szetszwo , sorry, there is a misleading message in my previous comment.  Follower is crashed when Ratis call statemachine#applyTransaction through StateMachineUpdater, while SCM statemachine is installing snapshot, not the appendEntities request handling. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] github-actions[bot] commented on pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #3403:
URL: https://github.com/apache/ozone/pull/3403#issuecomment-1216008428

   Thank you very much for the patch. I am closing this PR __temporarily__ as there was no activity recently and it is waiting for response from its author.
   
   It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time.
   
   It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs.
   
   If you need ANY help to finish this PR, please [contact the community](https://github.com/apache/hadoop-ozone#contact) on the mailing list or the slack channel."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #3403: HDDS-6732. Follower SCM crashed during snapshot installation.

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on PR #3403:
URL: https://github.com/apache/ozone/pull/3403#issuecomment-1379749332

   The issue is not reproduced later with a new ratis verion. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org