You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by GitBox <gi...@apache.org> on 2021/04/21 19:46:09 UTC

[GitHub] [geode] Bill opened a new pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Bill opened a new pull request #6360:
URL: https://github.com/apache/geode/pull/6360


   [GEODE-9180](https://issues.apache.org/jira/browse/GEODE-9180)
   
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
   
   - [x] Has your PR been rebased against the latest commit within the target branch (typically `develop`)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   - [x] Does `gradlew build` run cleanly?
   
   - [x] Have you written or updated unit tests to verify your changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] Bill commented on pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
Bill commented on pull request #6360:
URL: https://github.com/apache/geode/pull/6360#issuecomment-824408188


   @kamilla1201 will you please review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] Bill merged pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
Bill merged pull request #6360:
URL: https://github.com/apache/geode/pull/6360


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] Bill commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
Bill commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618738780



##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;
+
+    @Override
+    public void run() {
+      Thread.currentThread().setName("Geode Heartbeat Sender");
+      sendPeriodicHeartbeats(Thread::sleep, System::nanoTime, logger::warn);
+    }
+
+    @VisibleForTesting
+    void sendPeriodicHeartbeats(final Sleeper sleeper,
+        final NanoTimer nanoTimer,
+        final Warner warner) {
+      while (!isStopping && !services.getCancelCriterion().isCancelInProgress()) {
+        try {
+          final long timeBeforeSleep = nanoTimer.nanoTime();
+          sleeper.sleep(sleepPeriodMillis);
+          final long timeAfterSleep = nanoTimer.nanoTime();
+          final long asleepNanos = timeAfterSleep - timeBeforeSleep;
+          if (asleepNanos > sleepLimitNanos) {
+            warner.warn(
+                String.format(
+                    "Failure detection heartbeat-generation thread overslept by more than a full period. Asleep time: %,d nanoseconds. Period: %,d nanoseconds.",
+                    asleepNanos, sleepPeriodNanos));
+          }
+        } catch (InterruptedException e) {

Review comment:
       nice catch! coming in next commit…




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] bschuchardt commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
bschuchardt commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618641647



##########
File path: geode-membership/src/integrationTest/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitorJUnitTest.java
##########
@@ -999,6 +1000,52 @@ public void testDoTCPCheckMemberWithUnkownStatus() throws Exception {
     executeTestDoTCPCheck(GMSHealthMonitor.ERROR + 100, false);
   }
 
+  @Test
+  public void heartbeatOversleepCausesWarning() {
+    testHeartbeatSleepScenario(period -> 2 * period + 1,

Review comment:
       maybe replace the "2"s in this test with the logical-interval constant from GMSHealthMonitor




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] bschuchardt commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
bschuchardt commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618645034



##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;
+
+    @Override
+    public void run() {
+      Thread.currentThread().setName("Geode Heartbeat Sender");
+      sendPeriodicHeartbeats(Thread::sleep, System::nanoTime, logger::warn);
+    }
+
+    @VisibleForTesting
+    void sendPeriodicHeartbeats(final Sleeper sleeper,

Review comment:
       I like this nice injection of time-related functions and the log message consumer




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] bschuchardt commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
bschuchardt commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618645034



##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;
+
+    @Override
+    public void run() {
+      Thread.currentThread().setName("Geode Heartbeat Sender");
+      sendPeriodicHeartbeats(Thread::sleep, System::nanoTime, logger::warn);
+    }
+
+    @VisibleForTesting
+    void sendPeriodicHeartbeats(final Sleeper sleeper,

Review comment:
       I like this nice injection of time-related functions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] kamilla1201 commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
kamilla1201 commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618518085



##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;

Review comment:
       I don't quite understand what `sleepLimitNanos` is for. Why is `sleepPeriodNanos` multiplied by 2 here? Is it because `LOGICAL_INTERVAL`'s default value is 2?

##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;
+
+    @Override
+    public void run() {
+      Thread.currentThread().setName("Geode Heartbeat Sender");
+      sendPeriodicHeartbeats(Thread::sleep, System::nanoTime, logger::warn);
+    }
+
+    @VisibleForTesting
+    void sendPeriodicHeartbeats(final Sleeper sleeper,
+        final NanoTimer nanoTimer,
+        final Warner warner) {
+      while (!isStopping && !services.getCancelCriterion().isCancelInProgress()) {
+        try {
+          final long timeBeforeSleep = nanoTimer.nanoTime();
+          sleeper.sleep(sleepPeriodMillis);
+          final long timeAfterSleep = nanoTimer.nanoTime();
+          final long asleepNanos = timeAfterSleep - timeBeforeSleep;
+          if (asleepNanos > sleepLimitNanos) {
+            warner.warn(
+                String.format(
+                    "Failure detection heartbeat-generation thread overslept by more than a full period. Asleep time: %,d nanoseconds. Period: %,d nanoseconds.",
+                    asleepNanos, sleepPeriodNanos));
+          }
+        } catch (InterruptedException e) {

Review comment:
       looks like we should set the interrupt flag here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] Bill commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
Bill commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618734691



##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;

Review comment:
       I have somewhat arbitrarily chosen two periods as the sleep limit. I could have e.g. chosen a period and a half as a limit. Or I could have chosen three periods as a limit. Or I could've chosen something else entirely.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] Bill commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
Bill commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618734691



##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;

Review comment:
       I have somewhat arbitrarily chosen two periods as the sleep limit (i.e. we log a warning if we sleep more than one period beyond the time we intended to wake). I could have e.g. chosen a period and a half as a limit. Or I could have chosen three periods as a limit. Or I could've chosen something else entirely.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [geode] Bill commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

Posted by GitBox <gi...@apache.org>.
Bill commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618739346



##########
File path: geode-membership/src/integrationTest/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitorJUnitTest.java
##########
@@ -999,6 +1000,52 @@ public void testDoTCPCheckMemberWithUnkownStatus() throws Exception {
     executeTestDoTCPCheck(GMSHealthMonitor.ERROR + 100, false);
   }
 
+  @Test
+  public void heartbeatOversleepCausesWarning() {
+    testHeartbeatSleepScenario(period -> 2 * period + 1,

Review comment:
       actually that's a whole different `2`. This `2` is the (arbitrary) oversleep criterion. Will fix in next commit…




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org