You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by GitBox <gi...@apache.org> on 2020/12/10 23:39:25 UTC

[GitHub] [geode] kamilla1201 opened a new pull request #5839: GEODE-7861: Improve error reporting in GMSJoinLeave.join()

kamilla1201 opened a new pull request #5839:
URL: https://github.com/apache/geode/pull/5839


   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build issues and
   submit an update to your PR as soon as possible. If you need help, please send an
   email to dev@geode.apache.org.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] bschuchardt commented on a change in pull request #5839: GEODE-7861: Improve error reporting in GMSJoinLeave.join()

Posted by GitBox <gi...@apache.org>.
bschuchardt commented on a change in pull request #5839:
URL: https://github.com/apache/geode/pull/5839#discussion_r546773728



##########
File path: geode-membership/src/integrationTest/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeaveJUnitTest.java
##########
@@ -1441,24 +1435,74 @@ public void testCoordinatorFindRequestSuccess() throws Exception {
   public void testCoordinatorFindRequestFailure() throws Exception {
     try {
       initMocks(false);
-      HashSet<MemberIdentifier> registrants = new HashSet<>();
-      registrants.add(mockMembers[0]);
-      FindCoordinatorResponse fcr = new FindCoordinatorResponse(mockMembers[0], mockMembers[0],
-          false, null, registrants, false, true, null);
+      mockRequestToServer(eq(new HostAndPort("localhost", 12346)));
       GMSMembershipView view = createView();
       JoinResponseMessage jrm = new JoinResponseMessage(mockMembers[0], view, 0);
       gmsJoinLeave.setJoinResponseMessage(jrm);
 
-      when(locatorClient.requestToServer(eq(new HostAndPort("localhost", 12346)),
-          isA(FindCoordinatorRequest.class), anyInt(), anyBoolean()))
-              .thenReturn(fcr);
-
-      assertFalse("Should not be able to join ", gmsJoinLeave.join());
+      assertThatThrownBy(() -> gmsJoinLeave.join())
+          .isInstanceOf(MembershipConfigurationException.class);
     } finally {
-
     }
   }
 
+  @Test
+  public void testJoinFailureWhenSleepInterrupted() throws Exception {
+    initMocks(false);
+    mockRequestToServer(isA(HostAndPort.class));
+
+    when(mockConfig.getMemberTimeout()).thenReturn(100L);
+    when(mockConfig.getJoinTimeout()).thenReturn(1000L);
+
+    GMSJoinLeave spyGmsJoinLeave = spy(gmsJoinLeave);
+    when(spyGmsJoinLeave.hasCoordinatorJoinedCluster(-1, GMSJoinLeave.JOIN_RETRY_SLEEP))
+        .thenThrow(new InterruptedException());

Review comment:
       I like it - using a "when" to throw an InterruptedException

##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/GMSMembership.java
##########
@@ -565,12 +565,7 @@ private void join() throws MemberStartupException {
         this.isJoining = true; // added for bug #44373
 
         // connect
-        boolean ok = services.getJoinLeave().join();
-
-        if (!ok) {
-          throw new MembershipConfigurationException("Unable to join the distributed system.  "
-              + "Operation either timed out, was stopped or Locator does not exist.");
-        }
+        services.getJoinLeave().join();

Review comment:
       nice change to make this a "void" method that throws an exception if it doesn't succeed in joining the cluster

##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
##########
@@ -428,6 +432,24 @@ public boolean join() throws MemberStartupException {
     }
   }
 
+  boolean hasCoordinatorJoinedCluster(int viewId, long retrySleep)

Review comment:
       How about naming this something indicating that a sleep is going to be performed, like "pauseIfThereIsNoCoordinator"?

##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/interfaces/JoinLeave.java
##########
@@ -29,7 +29,7 @@
    * joins the distributed system and returns true if successful, false if not. Throws

Review comment:
       This comment needs to be changed to reflect that the method now throws an exception if it's unable to join the cluster.  The comment should say which exception is thrown.  Should the exception be changed to be a checked exception instead of a runtime exception?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] bschuchardt merged pull request #5839: GEODE-7861: Improve error reporting in GMSJoinLeave.join()

Posted by GitBox <gi...@apache.org>.
bschuchardt merged pull request #5839:
URL: https://github.com/apache/geode/pull/5839


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] bschuchardt commented on a change in pull request #5839: GEODE-7861: Improve error reporting in GMSJoinLeave.join()

Posted by GitBox <gi...@apache.org>.
bschuchardt commented on a change in pull request #5839:
URL: https://github.com/apache/geode/pull/5839#discussion_r553478372



##########
File path: geode-membership/src/integrationTest/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeaveJUnitTest.java
##########
@@ -115,11 +117,18 @@ public void initMocks(boolean enableNetworkPartition) throws Exception {
 
   public void initMocks(boolean enableNetworkPartition, boolean useTestGMSJoinLeave)
       throws Exception {
+    String locator = "localhost[12345]";
+    initMocks(enableNetworkPartition, useTestGMSJoinLeave, locator, locator);
+  }
+
+  public void initMocks(boolean enableNetworkPartition, boolean useTestGMSJoinLeave,
+      String locators, String startLoactor)

Review comment:
       typo:  "startLocator"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] echobravopapa commented on a change in pull request #5839: GEODE-7861: Improve error reporting in GMSJoinLeave.join()

Posted by GitBox <gi...@apache.org>.
echobravopapa commented on a change in pull request #5839:
URL: https://github.com/apache/geode/pull/5839#discussion_r544649359



##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
##########
@@ -428,6 +432,24 @@ public boolean join() throws MemberStartupException {
     }
   }
 
+  boolean hasCoordinatorJoinedCluster(int viewId, long retrySleep)

Review comment:
       looks like this extraction was test driven

##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
##########
@@ -383,40 +383,44 @@ public boolean join() throws MemberStartupException {
             break;
           }
         }
-        try {
-          if (found && !state.hasContactedAJoinedLocator) {
-            // if locators are restarting they may be handing out IDs from a stale view that
-            // we should go through quickly. Otherwise we should sleep a bit to let failure
-            // detection select a new coordinator
-            if (state.possibleCoordinator.getVmViewId() < 0) {
-              logger.debug("sleeping for {} before making another attempt to find the coordinator",
-                  retrySleep);
-              Thread.sleep(retrySleep);
-            } else {
+        if (found && !state.hasContactedAJoinedLocator) {
+          try {
+            if (hasCoordinatorJoinedCluster(state.possibleCoordinator.getVmViewId(), retrySleep)) {
               // since we were given a coordinator that couldn't be used we should keep trying
               tries = 0;
               giveupTime = System.currentTimeMillis() + timeout;
             }
+          } catch (InterruptedException e) {
+            Thread.currentThread().interrupt();
+            throw new MembershipConfigurationException(
+                "Retry sleep interrupted. Giving up on joining the distributed system.");
           }
-        } catch (InterruptedException e) {
-          logger.debug("retry sleep interrupted - giving up on joining the distributed system");
-          return false;
         }
       } // for
 
       if (!this.isJoined) {
         logger.debug("giving up attempting to join the distributed system after "
             + (System.currentTimeMillis() - startTime) + "ms");
-      }
 
-      // to preserve old behavior we need to throw a MemberStartupException if
-      // unable to contact any of the locators
-      if (!this.isJoined && state.hasContactedAJoinedLocator) {
-        throw new MemberStartupException("Unable to join the distributed system in "
-            + (System.currentTimeMillis() - startTime) + "ms");
-      }
+        // to preserve old behavior we need to throw a MemberStartupException if
+        // unable to contact any of the locators
+        if (state.hasContactedAJoinedLocator) {
+          throw new MemberStartupException("Unable to join the distributed system in "
+              + (System.currentTimeMillis() - startTime) + "ms");
+        }
 
-      return this.isJoined;
+        if (state.locatorsContacted == 0) {
+          throw new MembershipConfigurationException(
+              "Unable to join the distributed system. Could not contact any of the locators: "

Review comment:
       as I'm understanding this change, this section is the critical change to improve upon error reporting... I don't see unit testing for the improvement; there is a new test that looks for the above Exception string. Only it seems that a test could validate the expanded error information has been sent as well.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] bschuchardt merged pull request #5839: GEODE-7861: Improve error reporting in GMSJoinLeave.join()

Posted by GitBox <gi...@apache.org>.
bschuchardt merged pull request #5839:
URL: https://github.com/apache/geode/pull/5839


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org