You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/05 11:25:24 UTC

[GitHub] [flink] wanglijie95 opened a new pull request, #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

wanglijie95 opened a new pull request, #20169:
URL: https://github.com/apache/flink/pull/20169

   ## What is the purpose of the change
   [FLINK-28391][runtime] Fix unstable test `DefaultBlocklistHandlerTest#testRemoveTimeoutNodes`
   
   ## Verifying this change
   This change is already covered by existing tests
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (**no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**no**)
     - The serializers: (**no**)
     - The runtime per-record code paths (performance sensitive): (**no**)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (**no**)
     - The S3 file system connector: (**no**)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (**no**)
     - If yes, how is the feature documented? (**not applicable**)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] zhuzhurk commented on a diff in pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on code in PR #20169:
URL: https://github.com/apache/flink/pull/20169#discussion_r914440062


##########
flink-runtime/src/test/java/org/apache/flink/runtime/blocklist/DefaultBlocklistHandlerTest.java:
##########
@@ -166,8 +166,8 @@ public void blockResources(Collection<BlockedNode> blockedNodes) {
         }
 
         @Override
-        public void unblockResources(Collection<BlockedNode> unBlockedNodes) {
-            allUnBlockedNodes.addAll(unBlockedNodes);
+        public void unblockResources(Collection<BlockedNode> unblockedNodes) {
+            allUnBlockedNodes.addAll(unblockedNodes);

Review Comment:
   allUnBlockedNodes -> allUnblockedNodes
   
   I can also see other occurrences in DefaultBlocklistHandlerTest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] wanglijie95 commented on a diff in pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

Posted by GitBox <gi...@apache.org>.
wanglijie95 commented on code in PR #20169:
URL: https://github.com/apache/flink/pull/20169#discussion_r914450379


##########
flink-runtime/src/test/java/org/apache/flink/runtime/blocklist/DefaultBlocklistHandlerTest.java:
##########
@@ -166,8 +166,8 @@ public void blockResources(Collection<BlockedNode> blockedNodes) {
         }
 
         @Override
-        public void unblockResources(Collection<BlockedNode> unBlockedNodes) {
-            allUnBlockedNodes.addAll(unBlockedNodes);
+        public void unblockResources(Collection<BlockedNode> unblockedNodes) {
+            allUnBlockedNodes.addAll(unblockedNodes);

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] wanglijie95 commented on a diff in pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

Posted by GitBox <gi...@apache.org>.
wanglijie95 commented on code in PR #20169:
URL: https://github.com/apache/flink/pull/20169#discussion_r914404384


##########
flink-runtime/src/test/java/org/apache/flink/runtime/blocklist/DefaultBlocklistHandlerTest.java:
##########
@@ -77,24 +87,53 @@ void testAddNewBlockedNodes() throws Exception {
 
     @Test
     void testRemoveTimeoutNodes() throws Exception {
+        final TestingComponentMainThreadExecutor mainThreadExecutor =
+                new TestingComponentMainThreadExecutor(
+                        ComponentMainThreadExecutorServiceAdapter.forSingleThreadExecutor(
+                                EXECUTOR_EXTENSION.getExecutor()));
+
         long currentTimestamp = System.currentTimeMillis();
         BlockedNode node1 = new BlockedNode("node1", "cause", currentTimestamp + 1000L);
         BlockedNode node2 = new BlockedNode("node2", "cause", currentTimestamp + 3000L);
 
         TestBlocklistContext context = new TestBlocklistContext();
-        try (DefaultBlocklistHandler handler = createDefaultBlocklistHandler(context)) {
-
-            handler.addNewBlockedNodes(Arrays.asList(node1, node2));
-            assertThat(handler.getAllBlockedNodeIds()).hasSize(2);
-            assertThat(context.allUnBlockedNodes).hasSize(0);
+        try (DefaultBlocklistHandler handler =
+                createDefaultBlocklistHandler(
+                        context, mainThreadExecutor.getMainThreadExecutor())) {
+            mainThreadExecutor.execute(
+                    () -> {
+                        handler.addNewBlockedNodes(Arrays.asList(node1, node2));
+                        assertThat(handler.getAllBlockedNodeIds()).hasSize(2);
+                        assertThat(context.allUnBlockedNodes).hasSize(0);
+                    });
 
             // wait node1 timeout
-            CommonTestUtils.waitUntilCondition(() -> handler.getAllBlockedNodeIds().size() == 1);
-            assertThat(context.allUnBlockedNodes).containsExactly(node1);
+            CommonTestUtils.waitUntilCondition(
+                    () ->
+                            mainThreadExecutor.execute(
+                                            () -> {
+                                                int nodes = handler.getAllBlockedNodeIds().size();
+                                                if (nodes == 1) {
+                                                    assertThat(context.allUnBlockedNodes)
+                                                            .containsExactly(node1);
+                                                }
+                                                return nodes;
+                                            })
+                                    == 1);
 
             // wait node2 timeout
-            CommonTestUtils.waitUntilCondition(() -> handler.getAllBlockedNodeIds().size() == 0);
-            assertThat(context.allUnBlockedNodes).containsExactly(node1, node2);
+            CommonTestUtils.waitUntilCondition(

Review Comment:
   You are right. I ‘ve changed it with reference to `TaskExecutorManagerTest#testTimeoutForUnusedTaskManager`, please have a look
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] wanglijie95 commented on pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

Posted by GitBox <gi...@apache.org>.
wanglijie95 commented on PR #20169:
URL: https://github.com/apache/flink/pull/20169#issuecomment-1174949883

   Could you help to review this fix at your convenience? @zhuzhurk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] zhuzhurk commented on pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on PR #20169:
URL: https://github.com/apache/flink/pull/20169#issuecomment-1176160923

   Merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

Posted by GitBox <gi...@apache.org>.
flinkbot commented on PR #20169:
URL: https://github.com/apache/flink/pull/20169#issuecomment-1174955629

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3b6c744e79eaa2559d60ce01b6c397f2bd350232",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3b6c744e79eaa2559d60ce01b6c397f2bd350232",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b6c744e79eaa2559d60ce01b6c397f2bd350232 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] zhuzhurk commented on a diff in pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

Posted by GitBox <gi...@apache.org>.
zhuzhurk commented on code in PR #20169:
URL: https://github.com/apache/flink/pull/20169#discussion_r914370881


##########
flink-runtime/src/test/java/org/apache/flink/runtime/blocklist/DefaultBlocklistHandlerTest.java:
##########
@@ -77,24 +87,53 @@ void testAddNewBlockedNodes() throws Exception {
 
     @Test
     void testRemoveTimeoutNodes() throws Exception {
+        final TestingComponentMainThreadExecutor mainThreadExecutor =
+                new TestingComponentMainThreadExecutor(
+                        ComponentMainThreadExecutorServiceAdapter.forSingleThreadExecutor(
+                                EXECUTOR_EXTENSION.getExecutor()));
+
         long currentTimestamp = System.currentTimeMillis();
         BlockedNode node1 = new BlockedNode("node1", "cause", currentTimestamp + 1000L);
         BlockedNode node2 = new BlockedNode("node2", "cause", currentTimestamp + 3000L);
 
         TestBlocklistContext context = new TestBlocklistContext();
-        try (DefaultBlocklistHandler handler = createDefaultBlocklistHandler(context)) {
-
-            handler.addNewBlockedNodes(Arrays.asList(node1, node2));
-            assertThat(handler.getAllBlockedNodeIds()).hasSize(2);
-            assertThat(context.allUnBlockedNodes).hasSize(0);
+        try (DefaultBlocklistHandler handler =
+                createDefaultBlocklistHandler(
+                        context, mainThreadExecutor.getMainThreadExecutor())) {
+            mainThreadExecutor.execute(
+                    () -> {
+                        handler.addNewBlockedNodes(Arrays.asList(node1, node2));
+                        assertThat(handler.getAllBlockedNodeIds()).hasSize(2);
+                        assertThat(context.allUnBlockedNodes).hasSize(0);
+                    });
 
             // wait node1 timeout
-            CommonTestUtils.waitUntilCondition(() -> handler.getAllBlockedNodeIds().size() == 1);
-            assertThat(context.allUnBlockedNodes).containsExactly(node1);
+            CommonTestUtils.waitUntilCondition(
+                    () ->
+                            mainThreadExecutor.execute(
+                                            () -> {
+                                                int nodes = handler.getAllBlockedNodeIds().size();
+                                                if (nodes == 1) {
+                                                    assertThat(context.allUnBlockedNodes)
+                                                            .containsExactly(node1);
+                                                }
+                                                return nodes;
+                                            })
+                                    == 1);
 
             // wait node2 timeout
-            CommonTestUtils.waitUntilCondition(() -> handler.getAllBlockedNodeIds().size() == 0);
-            assertThat(context.allUnBlockedNodes).containsExactly(node1, node2);
+            CommonTestUtils.waitUntilCondition(

Review Comment:
   I'm afraid the case can still be unstable in the case that 2 nodes are unblocked when conducting the first check (nodes == 1). This is possible to happen if the environment is very slow.
   I would propose to change the test by getting the job to one stable status and do the first check, and then take some action to get the job to another stable status and do the second check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] zhuzhurk closed pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes

Posted by GitBox <gi...@apache.org>.
zhuzhurk closed pull request #20169: [FLINK-28391][runtime] Fix unstable test DefaultBlocklistHandlerTest#testRemoveTimeoutNodes
URL: https://github.com/apache/flink/pull/20169


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org