You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by GitBox <gi...@apache.org> on 2021/04/22 15:50:18 UTC

[GitHub] [geode] kamilla1201 commented on a change in pull request #6360: GEODE-9180: heartbeat producer logs a warning when it misses beats

kamilla1201 commented on a change in pull request #6360:
URL: https://github.com/apache/geode/pull/6360#discussion_r618518085



##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;

Review comment:
       I don't quite understand what `sleepLimitNanos` is for. Why is `sleepPeriodNanos` multiplied by 2 here? Is it because `LOGICAL_INTERVAL`'s default value is 2?

##########
File path: geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
##########
@@ -1538,4 +1463,114 @@ public void run() {
   public MembershipStatistics getStats() {
     return this.stats;
   }
+
+  @FunctionalInterface
+  interface Sleeper {
+    void sleep(long millis) throws InterruptedException;
+  }
+
+  @FunctionalInterface
+  interface NanoTimer {
+    long nanoTime();
+  }
+
+  @FunctionalInterface
+  interface Warner {
+    void warn(String message);
+  }
+
+  class Heart implements Runnable {
+
+    public final long sleepPeriodMillis = memberTimeout / LOGICAL_INTERVAL;
+    public final long sleepPeriodNanos =
+        TimeUnit.NANOSECONDS.convert(sleepPeriodMillis, TimeUnit.MILLISECONDS);
+    public final long sleepLimitNanos = 2 * sleepPeriodNanos;
+
+    @Override
+    public void run() {
+      Thread.currentThread().setName("Geode Heartbeat Sender");
+      sendPeriodicHeartbeats(Thread::sleep, System::nanoTime, logger::warn);
+    }
+
+    @VisibleForTesting
+    void sendPeriodicHeartbeats(final Sleeper sleeper,
+        final NanoTimer nanoTimer,
+        final Warner warner) {
+      while (!isStopping && !services.getCancelCriterion().isCancelInProgress()) {
+        try {
+          final long timeBeforeSleep = nanoTimer.nanoTime();
+          sleeper.sleep(sleepPeriodMillis);
+          final long timeAfterSleep = nanoTimer.nanoTime();
+          final long asleepNanos = timeAfterSleep - timeBeforeSleep;
+          if (asleepNanos > sleepLimitNanos) {
+            warner.warn(
+                String.format(
+                    "Failure detection heartbeat-generation thread overslept by more than a full period. Asleep time: %,d nanoseconds. Period: %,d nanoseconds.",
+                    asleepNanos, sleepPeriodNanos));
+          }
+        } catch (InterruptedException e) {

Review comment:
       looks like we should set the interrupt flag here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org