You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/05/08 03:56:44 UTC

[GitHub] [ozone] GlenGeng opened a new pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

GlenGeng opened a new pull request #2224:
URL: https://github.com/apache/ozone/pull/2224


   ## What changes were proposed in this pull request?
   
   As mentioned in https://issues.apache.org/jira/browse/RATIS-1370
   During `notifyInstallSnapshotFromLeader`, `StateMachineUpdater` may call `applyTransactions` when StateMachine is in PAUSED state.
   Just divide snapshot related work into `notifyInstallSnapshotFromLeader` and `reinitialize` for `SCMStateMachine`.
   
   During `notifyInstallSnapshotFromLeader`, SCM just downloads snapshot but not modify StateMachine, since StateMachineUpdater is in RUNNING state, may call applyTransactions during this period.
   During `reinitialize`, SCM can safely reload the StateMachine, such as rocksdb and in-memory state. During this period, StateMachineUpdater is in RELOAD state, thus there will be no contention between SCM and StateMachineUpdater.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5173
   
   ## How was this patch tested?
   
   CI
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant merged pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant merged pull request #2224:
URL: https://github.com/apache/ozone/pull/2224


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng edited a comment on pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
GlenGeng edited a comment on pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#issuecomment-836224580


   Hey @adoroszlai , a question about CI result, why the last two commits are passed, but the `checks` is not ? They seems to derive from different sources.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#issuecomment-836279155


   Thanks @adoroszlai for the explanation. I will rebase the dev branch with latest master, to see if `TestRatisPipelineProvider ` can pass on the branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#issuecomment-837830673


   Thanks @GlenGeng for the work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#issuecomment-836224580


   Hey @adoroszlai , a question about CI result, why the last two commits are passed, but the `checks` is not ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#issuecomment-836267709


   > a question about CI result, why the last two commits are passed, but the `checks` is not ?
   
   Do you mean why the checks passed on your branch, but not in the PR?  It can happen for two reasons:
   
   1. PR checks are run against the "merge" commit, which includes recent changes from the target branch (`master`).  This source of difference can be eliminated (temporarily) by merging `master` into your branch, instead of triggering CI with empty commit.
   2. Flaky tests.
   
   I think in this case it is the second one.  `TestRatisPipelineProvider` has been failing occasionally, and it passes for me locally on your branch, as well as after a merge from `master`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#issuecomment-836289420


   > Please prefer merge over rebase for open PRs. 
   
   Sure. Will take care of this in future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#discussion_r629138201



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -76,6 +78,11 @@
   private final boolean isInitialized;
   private ExecutorService installSnapshotExecutor;
 
+  // The atomic variable RaftServerImpl#inProgressInstallSnapshotRequest
+  // ensures serializable between notifyInstallSnapshotFromLeader()
+  // and reinitialize().
+  private DBCheckpoint installingDBCheckpoint;
+

Review comment:
       Once, the installSnaoshot completes, i guess, we need to set the "IntsallingCheckpoint" filed to null. Also, before the downloadSnaoshot icall is invoked, we should add a precondition check that installingCheckpoint is null always.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#issuecomment-836286920


   > I will rebase the dev branch with latest master
   
   Please prefer merge over rebase for open PRs.  Rebase rewrites previous commits, so we lose CI history (or at least it is much harder to find).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#discussion_r629139192



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -296,10 +319,27 @@ public void pause() {
 
   @Override
   public void reinitialize() {
-    if (getLifeCycleState() == LifeCycle.State.PAUSED) {
-      getLifeCycle().transition(LifeCycle.State.STARTING);
-      getLifeCycle().transition(LifeCycle.State.RUNNING);
+    Preconditions.checkNotNull(installingDBCheckpoint);
+
+    TermIndex termIndex = null;
+    try {
+      termIndex =
+          scm.getScmHAManager().installCheckpoint(installingDBCheckpoint);
+    } catch (Exception e) {
+      LOG.error("Failed to reinitialize SCMStateMachine.");
+      return;
+    }
+
+    // re-initialize the DoubleBuffer and update the lastAppliedIndex.

Review comment:
       misleading comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2224: HDDS-5173. Divide snapshot related work into notifyInstallSnapshotFromLeader and reinitialize for SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2224:
URL: https://github.com/apache/ozone/pull/2224#issuecomment-836222901


   @bharatviswa504  @bshashikant could you please take a look at this PR ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org