You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@celeborn.apache.org by GitBox <gi...@apache.org> on 2022/11/30 12:28:44 UTC

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1033: [CELEBORN-94][BUG] StateMachine should implement pause to change status

AngersZhuuuu opened a new pull request, #1033:
URL: https://github.com/apache/incubator-celeborn/pull/1033

   ### What changes were proposed in this pull request?
   
   <img width="1091" alt="截屏2022-11-30 下午8 25 13" src="https://user-images.githubusercontent.com/46485123/204795829-eaa7e2a7-8051-4b8b-a907-5a795b526043.png">
   After install snapshot, will notify StateMachineUpdate to reload
   <img width="915" alt="截屏2022-11-30 下午8 26 12" src="https://user-images.githubusercontent.com/46485123/204796010-7c0b3539-6ca3-4fb8-8841-e8cd8fa293a2.png">
   In StateMachineUpdater `reload()` will check StateMachine status should be PAUSED, then reinitialize StateMachine with latest snapshot
   <img width="1077" alt="截屏2022-11-30 下午8 26 39" src="https://user-images.githubusercontent.com/46485123/204796186-90775a71-363f-4597-990c-208501a557c9.png">
   
   In ServerState.installSnapshot method there will call StateMachine's `pause()`
   ```
     void installSnapshot(InstallSnapshotRequestProto request) throws IOException {
       // TODO: verify that we need to install the snapshot
       StateMachine sm = server.getStateMachine();
       sm.pause(); // pause the SM to prepare for install snapshot
       snapshotManager.installSnapshot(sm, request);
       updateInstalledSnapshotIndex(TermIndex.valueOf(request.getSnapshotChunk().getTermIndex()));
     }
   ```
   
   We should implement `pause()` in StateMachine and change the status
   
   
   
   ### Why are the changes needed?
   
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   ### What are the items that need reviewer attention?
   
   
   ### Related issues.
   
   
   ### Related pull requests.
   
   
   ### How was this patch tested?
   
   
   /cc @related-reviewer
   
   /assign @main-reviewer
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1033: [CELEBORN-94][BUG] StateMachine should implement pause to change status

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on PR #1033:
URL: https://github.com/apache/incubator-celeborn/pull/1033#issuecomment-1332079934

   Tested in prod env..can work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1033: [CELEBORN-94][BUG] StateMachine should implement pause to change status

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on code in PR #1033:
URL: https://github.com/apache/incubator-celeborn/pull/1033#discussion_r1036673800


##########
master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/StateMachine.java:
##########
@@ -189,8 +191,16 @@ public void initialize(RaftServer server, RaftGroupId id, RaftStorage raftStorag
   @Override
   public void reinitialize() throws IOException {
     LOG.info("Reinitializing state machine.");
+    getLifeCycle().compareAndTransition(PAUSED, STARTING);
     storage.loadLatestSnapshot();
     loadSnapshot(storage.getLatestSnapshot());
+    getLifeCycle().compareAndTransition(STARTING, RUNNING);
+  }
+
+  @Override
+  public void pause() {

Review Comment:
   > why not just change from running to paused?
   
   <img width="595" alt="截屏2022-12-01 下午12 13 54" src="https://user-images.githubusercontent.com/46485123/204964366-1aa73838-fa78-4633-b693-95cb2c37dbb6.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] SzyWilliam commented on pull request #1033: [CELEBORN-94][BUG] StateMachine should implement pause to change status

Posted by GitBox <gi...@apache.org>.
SzyWilliam commented on PR #1033:
URL: https://github.com/apache/incubator-celeborn/pull/1033#issuecomment-1332100974

   @AngersZhuuuu Thanks for working on this! I think we should also transition the state from pause to running in `reinitialize` hook. See the pause / reinitialize doc in Ratis: https://github.com/apache/ratis/blob/9d1e9895666936d911758d04b4ea3f8386694abe/ratis-docs/src/site/markdown/snapshot.md?plain=1#L159 and Ozone usage: https://github.com/apache/ozone/blob/829dfce410d6d8801421a87ac21922c5f767acda/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java#L388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture merged pull request #1033: [CELEBORN-94][BUG] StateMachine should implement pause to change status

Posted by GitBox <gi...@apache.org>.
waitinfuture merged PR #1033:
URL: https://github.com/apache/incubator-celeborn/pull/1033


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1033: [CELEBORN-94][BUG] StateMachine should implement pause to change status

Posted by GitBox <gi...@apache.org>.
waitinfuture commented on code in PR #1033:
URL: https://github.com/apache/incubator-celeborn/pull/1033#discussion_r1036671451


##########
master/src/main/java/org/apache/celeborn/service/deploy/master/clustermeta/ha/StateMachine.java:
##########
@@ -189,8 +191,16 @@ public void initialize(RaftServer server, RaftGroupId id, RaftStorage raftStorag
   @Override
   public void reinitialize() throws IOException {
     LOG.info("Reinitializing state machine.");
+    getLifeCycle().compareAndTransition(PAUSED, STARTING);
     storage.loadLatestSnapshot();
     loadSnapshot(storage.getLatestSnapshot());
+    getLifeCycle().compareAndTransition(STARTING, RUNNING);
+  }
+
+  @Override
+  public void pause() {

Review Comment:
   why not just change from running to paused?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1033: [CELEBORN-94][BUG] StateMachine should implement pause to change status

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on PR #1033:
URL: https://github.com/apache/incubator-celeborn/pull/1033#issuecomment-1332083716

   @FMX @waitinfuture @SzyWilliam 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1033: [CELEBORN-94][BUG] StateMachine should implement pause to change status

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on PR #1033:
URL: https://github.com/apache/incubator-celeborn/pull/1033#issuecomment-1333069762

   > @AngersZhuuuu Thanks for working on this! I think we should also transition the state from pause to running in `reinitialize` hook. See the pause / reinitialize doc in Ratis: https://github.com/apache/ratis/blob/9d1e9895666936d911758d04b4ea3f8386694abe/ratis-docs/src/site/markdown/snapshot.md?plain=1#L159 and Ozone usage: https://github.com/apache/ozone/blob/829dfce410d6d8801421a87ac21922c5f767acda/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java#L388
   
   Updated! How about current.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org