You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/09/10 16:35:00 UTC

[jira] [Commented] (GEODE-8473) Hang in ReplyProcessor21 when forced-disconnect does not establish a cancellation cause

    [ https://issues.apache.org/jira/browse/GEODE-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193715#comment-17193715 ] 

ASF GitHub Bot commented on GEODE-8473:
---------------------------------------

Bill commented on a change in pull request #5491:
URL: https://github.com/apache/geode/pull/5491#discussion_r486480775



##########
File path: geode-membership/src/integrationTest/java/org/apache/geode/distributed/internal/membership/gms/GMSMembershipJUnitTest.java
##########
@@ -396,6 +399,18 @@ public void testMulticastAllowedWithNewVersionViewMember() {
     assertThat(manager.getGMSManager().isMulticastAllowed()).isTrue();
   }
 
+  @Test
+  public void membershipInvokesUpstreamListenerDuringForcedDisconnect() {
+    // have an exception interrupt the shutdown process and ensure that a thread is
+    // launched to inform the cache of shutdown
+    IllegalStateException expectedException = new IllegalStateException();
+    doThrow(expectedException).when(services).emergencyClose();
+    assertThatThrownBy(() -> manager.uncleanShutdown("For testing",
+        new MemberDisconnectedException("For Testing")))
+            .isEqualTo(expectedException);
+    verify(listener).membershipFailure(isA(String.class), isA(Throwable.class));
+  }
+

Review comment:
       you tested it all!

##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/GMSMembership.java
##########
@@ -1267,25 +1267,27 @@ public void shutdown() {
   public void uncleanShutdown(String reason, final Exception e) {
     inhibitForcedDisconnectLogging(false);
 
-    if (services.getShutdownCause() == null) {
-      services.setShutdownCause(e);
-    }
-
-    if (cleanupTimer != null && !cleanupTimer.isShutdown()) {
-      cleanupTimer.shutdownNow();
-    }
+    try {
+      if (services.getShutdownCause() == null) {
+        services.setShutdownCause(e);
+      }
 
-    lifecycleListener.disconnect(e);
+      if (cleanupTimer != null && !cleanupTimer.isShutdown()) {
+        cleanupTimer.shutdownNow();
+      }
 
-    // first shut down communication so we don't do any more harm to other
-    // members
-    services.emergencyClose();
+      lifecycleListener.disconnect(e);
 
-    if (e != null) {
-      try {
-        listener.membershipFailure(reason, e);
-      } catch (RuntimeException re) {
-        logger.warn("Exception caught while shutting down", re);
+      // first shut down communication so we don't do any more harm to other
+      // members
+      services.emergencyClose();
+    } finally {
+      if (e != null) {
+        try {
+          listener.membershipFailure(reason, e);
+        } catch (RuntimeException re) {
+          logger.warn("Exception caught while shutting down", re);
+        }

Review comment:
       solid




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Hang in ReplyProcessor21 when forced-disconnect does not establish a cancellation cause
> ---------------------------------------------------------------------------------------
>
>                 Key: GEODE-8473
>                 URL: https://issues.apache.org/jira/browse/GEODE-8473
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.13.0
>            Reporter: Bruce J Schuchardt
>            Priority: Major
>              Labels: pull-request-available
>
> I suspect this is due to the recent Membership refactoring.  In a test that exposed GEODE-8467 I saw an application thread from before the forced-disconnect still hanging around waiting for a response.
> {noformat}
>    java.lang.Thread.State: TIMED_WAITING (parking)   java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000ea5c43c0> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72) at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779) at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865) at org.apache.geode.internal.cache.partitioned.SizeMessage$SizeResponse.waitBucketSizes(SizeMessage.java:344) at org.apache.geode.internal.cache.PartitionedRegion.getSizeRemotely(PartitionedRegion.java:6752) at org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6703) at org.apache.geode.internal.cache.PartitionedRegion.entryCount(PartitionedRegion.java:6685) at org.apache.geode.internal.cache.PartitionedRegion.getRegionSize(PartitionedRegion.java:6657) at org.apache.geode.internal.cache.LocalRegionDataView.entryCount(LocalRegionDataView.java:99) at org.apache.geode.internal.cache.LocalRegion.entryCount(LocalRegion.java:2078) at org.apache.geode.internal.cache.LocalRegion.size(LocalRegion.java:8288) at util.TestHelper.getRegionStr(TestHelper.java:1669) at util.TestHelper.regionHierarchyToString(TestHelper.java:1654) at util.TestHelper.logRegionHierarchy(TestHelper.java:1639) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hydra.MethExecutor.execute(MethExecutor.java:173) at hydra.MethExecutor.execute(MethExecutor.java:141) at hydra.TestTask.execute(TestTask.java:197) at hydra.RemoteTestModule$1.run(RemoteTestModule.java:213) {noformat}
> ReplyProcessor21 uses a StoppableCountdownLatch to wait for a response.  This latch loops waiting for countdown but also checks ClusterDistributionManager's CancelCriterion to see if the system is shutting down.  If so it stops waiting for a response.
> Due to GEODE-8467 the thread that sets the CancelCriterion's shutdown "rootCause" is never started.  Either Membership needs to ensure that this upward notification happens or ClusterDistributionManager's CancelCriterion needs to check with the Services.Stopper in GMSMembership to see if a "rootCause" has been established there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)