You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by GitBox <gi...@apache.org> on 2022/01/31 17:43:52 UTC

[GitHub] [geode] albertogpz opened a new pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

albertogpz opened a new pull request #7330:
URL: https://github.com/apache/geode/pull/7330


   …terCommandStarted
   
   A sleep has been added before starting the second group of puts
   in the testCommandDoesNotCopyEntriesUpdatedAfterCommandStarted
   test case so that we can be more certain that when
   puts start, the command has effectively started its execution.
   
   <!-- Thank you for submitting a contribution to Apache Geode. -->
   
   <!-- In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken: 
   -->
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   <!-- Note:
   Please ensure that once the PR is submitted, check Concourse for build issues and
   submit an update to your PR as soon as possible. If you need help, please send an
   email to dev@geode.apache.org.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r812174465



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       > @albertogpz I have not considered working on any of them at the moment. So feel free to pick up any of them. If you feel you have gotten stuck or need to chat to anyone about this, please feel free to reach out to me directly.
   > 
   > [GEODE-10071](https://issues.apache.org/jira/browse/GEODE-10071) and [GEODE-10068](https://issues.apache.org/jira/browse/GEODE-10068) should be reasonably simple and to apply
   > 
   > [GEODE-10067](https://issues.apache.org/jira/browse/GEODE-10067) and [GEODE-10069](https://issues.apache.org/jira/browse/GEODE-10069) would be good tasks to complete to improve overall simplicity and performance of the current implementation
   > 
   > [GEODE-10070](https://issues.apache.org/jira/browse/GEODE-10070) is a task that we need to carefully consider how it is implemented. Please feel free to reach out to the "experts" on this via the dev-list. Completing this task will be a huge step in the right direction when it comes to SoC and modularity
   
   @kohlmu-pivotal I will try to find some time to work on some of these items.
   I consider [GEODE-10067](https://issues.apache.org/jira/browse/GEODE-10067) and [GEODE-10069](https://issues.apache.org/jira/browse/GEODE-10069) the most important ones.
   I was thinking that maybe the first one could be solved by the proposal on the second one, i.e., calling `onRegion()` from one of the members so that when the function is executed in the corresponding member it could iterate over the keys in the filter and copy the entries by getting them without deserializing them.
    
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r796909436



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       @agingade, let me try to explain why this is not an bug in the command but a problem with timing in the test case.
   
   At the beginning, the test puts 50000 entries whose keys are in the range of 0-49999.
   
   Then, the wan-copy region command is executed and, in parallel, other puts and deletes are done. The keys for these entries puts/deleted are in the range of 50000-52000 (no intersection with the first entries put).
   
   The test tries to verify that only the entries put at the beginning are copied (none of the entries put while the command is running). It does so by verifying that the number of entries copied is exactly 50000 as reported by the command. If any entry put while the command is running is copied by the command, we should see that the number of entries copied is greater than 50000, which is what we see when the test fails (50001). But the test needs to make sure that the puts for the 50000-52000 range are done after the copy command has started and not before. Otherwise, we could get a failure of the test (which is what we see sometimes).
   
   The way to make sure that the second group of puts starts after the command has started is by means of the `await()` you mention. But when the `await()` returns, it only guarantees that the `Callable` that will execute the command has been submitted to the pool (see `WanCopyRegionFunctionService.execute()`) although the command may not have effectively started.
   
   At that point, random operations (puts and deletes) can be started. Let's say at t=x, entry with key=50000 is put.
   
   The time at which the Callable object that in turn calls the `WanCopyRegionFunctionDelegate.wanCopyRegion()`is executed depends on the scheduling. Let's say it executes its first line at t=x + 1. At that point, the current time is recorded as (`functionStartTimestamp`). and a sleep of 500ms will be done before entries are started to be copied.
   
   Then the command starts to copy entries. Any entry with a timestamp greater than `functionStartTimestamp` will not be copied. In this case, the timestamp for entry with key=50000 is x and given that `functionStartTimestamp` is x+1, the entry will be copied. This would not point to an bug in the command. Rather, that the entry with key=50000 was put before the command was started.
   
   With the 100ms extra added to the wait we can be more certain that second group of puts will be started after the `functionStartTimestamp` is recorded and therefore, the timestamp for the entries of the second group of puts will be higher than the `functionStartTimestamp`.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r811346516



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       @kohlmu-pivotal Thanks for the improvements suggested. Are you planning on taking any of them? I am saying in order to tackle whichever you are not taking or all if you take none.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r808736289



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       @agingade Did my explanation answer your concerns?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r812112607



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       > @albertogpz a possible way to test this is to confirm that the two clusters have the same entries, rather than comparing against a counter that is constantly moving
   
   @kohlmu-pivotal I have changed the way to check that the command started copying entries by looking at the size of the remote site. If the remote site has at least one entry, it means that the command already started.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] albertogpz merged pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
albertogpz merged pull request #7330:
URL: https://github.com/apache/geode/pull/7330


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] kohlmu-pivotal commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
kohlmu-pivotal commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r810430356



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       @albertogpz I've looked at the code relating to the test and the functionality. I don't believe that the `Thread.sleep` should be the only recourse in resolving the flakiness of the test. I have raised [GEODE-10067](https://issues.apache.org/jira/browse/GEODE-10067) [GEODE-10068](https://issues.apache.org/jira/browse/GEODE-10068) [GEODE-10069](https://issues.apache.org/jira/browse/GEODE-10069) [GEODE-10070](https://issues.apache.org/jira/browse/GEODE-10070) [GEODE-10071](https://issues.apache.org/jira/browse/GEODE-10071) as starting points on how to possibly improve the underlying feature. With some improvements in the underlying implementation might help in resolving some of the flakiness of the test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] agingade commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
agingade commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r796772614



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       Not sure how this is going to fix the issue. The await() has poll-delay of 100msecs.
   Also the error is saying its expected 50000 but it was 50001
   Some how there is an extra entry that is copied than the expected.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r812120574



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       > @agingade, let me try to explain why this is not an bug in the command but a problem with timing in the test case.
   > 
   > At the beginning, the test puts 50000 entries whose keys are in the range of 0-49999.
   > 
   > Then, the wan-copy region command is executed and, in parallel, other puts and deletes are done. The keys for these entries puts/deleted are in the range of 50000-52000 (no intersection with the first entries put).
   > 
   > The test tries to verify that only the entries put at the beginning are copied (none of the entries put while the command is running). It does so by verifying that the number of entries copied is exactly 50000 as reported by the command. If any entry put while the command is running is copied by the command, we should see that the number of entries copied is greater than 50000, which is what we see when the test fails (50001). But the test needs to make sure that the puts for the 50000-52000 range are done after the copy command has started and not before. Otherwise, we could get a failure of the test (which is what we see sometimes).
   > 
   > The way to make sure that the second group of puts starts after the command has started is by means of the `await()` you mention. But when the `await()` returns, it only guarantees that the `Callable` that will execute the command has been submitted to the pool (see `WanCopyRegionFunctionService.execute()`) although the command may not have effectively started.
   > 
   > At that point, random operations (puts and deletes) can be started. Let's say at t=x, entry with key=50000 is put.
   > 
   > The time at which the Callable object that in turn calls the `WanCopyRegionFunctionDelegate.wanCopyRegion()`is executed depends on the scheduling. Let's say it executes its first line at t=x + 1. At that point, the current time is recorded as (`functionStartTimestamp`). and a sleep of 500ms will be done before entries are started to be copied.
   > 
   > Then the command starts to copy entries. Any entry with a timestamp greater than `functionStartTimestamp` will not be copied. In this case, the timestamp for entry with key=50000 is x and given that `functionStartTimestamp` is x+1, the entry will be copied. This would not point to an bug in the command. Rather, that the entry with key=50000 was put before the command was started.
   > 
   > With the 100ms extra added to the wait we can be more certain that second group of puts will be started after the `functionStartTimestamp` is recorded and therefore, the timestamp for the entries of the second group of puts will be higher than the `functionStartTimestamp`.
   
   @agingade I have removed the sleep and changed the check to avoid the flakyness by looking at the region size at the remote site.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] kohlmu-pivotal commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
kohlmu-pivotal commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r811473958



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       @albertogpz I have not considered working on any of them at the moment. So feel free to pick up any of them. If you feel you have gotten stuck or need to chat to anyone about this, please feel free to reach out to me directly.
   
   [GEODE-10071](https://issues.apache.org/jira/browse/GEODE-10071) and [GEODE-10068](https://issues.apache.org/jira/browse/GEODE-10068) should be reasonably simple and to apply
   
   [GEODE-10067](https://issues.apache.org/jira/browse/GEODE-10067) and [GEODE-10069](https://issues.apache.org/jira/browse/GEODE-10069) would be good tasks to complete to improve overall simplicity and performance of the current implementation
   
   [GEODE-10070](https://issues.apache.org/jira/browse/GEODE-10070) is a task that we need to carefully consider how it is implemented. Please feel free to reach out to the "experts" on this via the dev-list. Completing this task will be a huge step in the right direction when it comes to SoC and modularity




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] kohlmu-pivotal commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
kohlmu-pivotal commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r811474400



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       @albertogpz a possible way to test this is to confirm that the two clusters have the same entries, rather than comparing against a counter that is constantly moving




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] kohlmu-pivotal commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
kohlmu-pivotal commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r810430563



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       This should not be the only way we resolve the flakiness of the test. We need to take a step back and possibly have a different approach to the resolution




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] albertogpz commented on a change in pull request #7330: GEODE-9984: Avoid flakyness of testCommandDoesNotCopyEntriesUpdatedAf…

Posted by GitBox <gi...@apache.org>.
albertogpz commented on a change in pull request #7330:
URL: https://github.com/apache/geode/pull/7330#discussion_r811345803



##########
File path: geode-wan/src/distributedTest/java/org/apache/geode/cache/wan/internal/cli/commands/WanCopyRegionCommandDUnitTest.java
##########
@@ -1460,11 +1460,15 @@ public void createServersAndRegions(int locatorPort, List<VM> servers,
   }
 
   private void waitForWanCopyRegionCommandToStart(boolean useParallel, boolean usePartitionedRegion,
-      List<VM> servers) {
+      List<VM> servers) throws InterruptedException {
+    // Wait for the command execution to be registered in the service
     final int executionsExpected = useParallel && usePartitionedRegion ? servers.size() : 1;
     await().untilAsserted(
         () -> assertThat(getNumberOfCurrentExecutionsInServers(servers))
             .isEqualTo(executionsExpected));
+    // Wait some extra milliseconds to allow for the command to actually start to
+    // avoid flakyness in the tests.
+    Thread.sleep(100);

Review comment:
       Thanks for the comment, @kohlmu-pivotal . I will try to think about a different approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org