You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/10/30 16:54:40 UTC

[GitHub] [incubator-seatunnel] TaoZex opened a new pull request, #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

TaoZex opened a new pull request, #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237

   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[Feature] [component] Title of the pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   
   <!-- Describe the purpose of this pull request. For example: This pull request adds checkstyle plugin.-->
   
   ## Check list
   
   * [ ] Code changed are covered with tests, or it does not need tests for reason:
   * [ ] If any new Jar binary package adding in your PR, please add License Notice according
     [New License Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 merged pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NPE

Posted by GitBox <gi...@apache.org>.
EricJoy2048 merged PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CalvinKirs commented on a diff in pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on code in PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#discussion_r1008961334


##########
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/CoordinatorServiceTest.java:
##########
@@ -154,7 +154,7 @@ public void testJobRestoreWhenMasterNodeSwitch() throws InterruptedException {
             });
 
         // wait job restore
-        Thread.sleep(5000);
+        Thread.sleep(30000);

Review Comment:
   This is not a good practice. @Hisoka-X do you have any good ideas?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] FWLamb commented on a diff in pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

Posted by GitBox <gi...@apache.org>.
FWLamb commented on code in PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#discussion_r1009005908


##########
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/CoordinatorServiceTest.java:
##########
@@ -154,7 +154,7 @@ public void testJobRestoreWhenMasterNodeSwitch() throws InterruptedException {
             });
 
         // wait job restore
-        Thread.sleep(5000);
+        Thread.sleep(30000);
 
         // pipeline will recovery running state
         await().atMost(200000, TimeUnit.MILLISECONDS)

Review Comment:
   Increasing the waiting time here may succeed, For example, 10 minutes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on code in PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#discussion_r1009030936


##########
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/CoordinatorServiceTest.java:
##########
@@ -154,7 +154,7 @@ public void testJobRestoreWhenMasterNodeSwitch() throws InterruptedException {
             });
 
         // wait job restore
-        Thread.sleep(5000);
+        Thread.sleep(30000);

Review Comment:
   `Thread.sleep(5000) ` is used to wait the job begin restore and job status turn to a non `RUNNING` status. And then I use 
   
   ```
   // pipeline will recovery running state
           await().atMost(200000, TimeUnit.MILLISECONDS)
               .untilAsserted(
                   () -> Assertions.assertEquals(PipelineStatus.RUNNING,
                       server2.getCoordinatorService().getJobMaster(jobId).getPhysicalPlan().getPipelineList().get(0)
                           .getPipelineState()));
   
   ```
   
   to ensure the job restore complete and job status turn to running again.
   The problem is we can not ensure the job is begin restore and the status turn to a non `RUNNING` status. I tried use `await().atMost()` to check the job status leave `RUNNING`, However, I am worried that the non `RUNNING` status is too slow to continue, and `await` will miss the change of this state.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X commented on a diff in pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

Posted by GitBox <gi...@apache.org>.
Hisoka-X commented on code in PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#discussion_r1008980872


##########
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/CoordinatorServiceTest.java:
##########
@@ -154,7 +154,7 @@ public void testJobRestoreWhenMasterNodeSwitch() throws InterruptedException {
             });
 
         // wait job restore
-        Thread.sleep(5000);
+        Thread.sleep(30000);

Review Comment:
   If we can catch log to make sure job restore success, maybe we can remove this code. But now, we don't have good way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on code in PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#discussion_r1009030936


##########
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/CoordinatorServiceTest.java:
##########
@@ -154,7 +154,7 @@ public void testJobRestoreWhenMasterNodeSwitch() throws InterruptedException {
             });
 
         // wait job restore
-        Thread.sleep(5000);
+        Thread.sleep(30000);

Review Comment:
   `Thread.sleep(5000) ` is used to wait the job begin restore and job status turn to a non `RUNNING` status. And then I use 
   
   ```
   // pipeline will recovery running state
           await().atMost(200000, TimeUnit.MILLISECONDS)
               .untilAsserted(
                   () -> Assertions.assertEquals(PipelineStatus.RUNNING,
                       server2.getCoordinatorService().getJobMaster(jobId).getPhysicalPlan().getPipelineList().get(0)
                           .getPipelineState()));
   
   ```
   
   to ensure the job restore complete and job status turn to running again.
   The problem is we can not ensure the job is begin restore and the status turn to a non `RUNNING` status. I suggest use `await().atMost()` to check the job status leave `RUNNING`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TaoZex commented on a diff in pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

Posted by GitBox <gi...@apache.org>.
TaoZex commented on code in PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#discussion_r1009008918


##########
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/CoordinatorServiceTest.java:
##########
@@ -154,7 +154,7 @@ public void testJobRestoreWhenMasterNodeSwitch() throws InterruptedException {
             });
 
         // wait job restore
-        Thread.sleep(5000);
+        Thread.sleep(30000);

Review Comment:
   Agree with you, I think the problem is the code getJobMaster(jobId), which returns null when the master has not successfully switched.
   If we can't make sure job restore success now, we can either temporarily discard it or set the code Thread.sleep to a large value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] hailin0 commented on pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NPE

Posted by GitBox <gi...@apache.org>.
hailin0 commented on PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#issuecomment-1296692163

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TaoZex commented on a diff in pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

Posted by GitBox <gi...@apache.org>.
TaoZex commented on code in PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#discussion_r1009008918


##########
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/CoordinatorServiceTest.java:
##########
@@ -154,7 +154,7 @@ public void testJobRestoreWhenMasterNodeSwitch() throws InterruptedException {
             });
 
         // wait job restore
-        Thread.sleep(5000);
+        Thread.sleep(30000);

Review Comment:
   Agree with you, I think the problem is the code getJobMaster(jobId), which returns null when the master has not successfully switched.
   If we can't make sure job restore success now, we can either temporarily disable it or set the code Thread.sleep to a large value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #3237: [Hotfix][e2e] fix JobRestoreWhenMasterNodeSwitch method NP

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on code in PR #3237:
URL: https://github.com/apache/incubator-seatunnel/pull/3237#discussion_r1009030936


##########
seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/CoordinatorServiceTest.java:
##########
@@ -154,7 +154,7 @@ public void testJobRestoreWhenMasterNodeSwitch() throws InterruptedException {
             });
 
         // wait job restore
-        Thread.sleep(5000);
+        Thread.sleep(30000);

Review Comment:
   `Thread.sleep(5000) ` is used to wait the job begin restore and job status turn to a non `RUNNING` status. And then I use 
   
   ```
   // pipeline will recovery running state
           await().atMost(200000, TimeUnit.MILLISECONDS)
               .untilAsserted(
                   () -> Assertions.assertEquals(PipelineStatus.RUNNING,
                       server2.getCoordinatorService().getJobMaster(jobId).getPhysicalPlan().getPipelineList().get(0)
                           .getPipelineState()));
   
   ```
   
   to ensure the job restore complete and job status turn to running again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org