You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by GitBox <gi...@apache.org> on 2022/06/03 19:03:23 UTC

[GitHub] [geode] pivotal-jbarrett commented on a diff in pull request #7751: GEODE-8977: change ThreadMonitor to reduce how long it does a "stop the world" ThreadDump vm op

pivotal-jbarrett commented on code in PR #7751:
URL: https://github.com/apache/geode/pull/7751#discussion_r889254461


##########
geode-core/src/main/java/org/apache/geode/internal/monitoring/ThreadsMonitoringProcess.java:
##########
@@ -119,29 +119,69 @@ private int checkForStuckThreads(Collection<AbstractExecutor> executors, long cu
     return result;
   }
 
+  /**
+   * If set to true, then the JVM will be asked for what locks a thread holds.
+   * This is extra expensive to ask for on some JVMs so be careful setting this to true.
+   */
+  private static final boolean SHOW_LOCKS = Boolean.getBoolean("gemfire.threadmonitor.showLocks");
+  /**
+   * If set to true, then the JVM will be asked for all potential stuck threads with one call.
+   * Since getThreadInfo on many JVMs, stops ALL threads from running, and since getting info
+   * on multiple threads with one call is additional work, setting this can cause an extra long
+   * stop the world that can then cause other problems (like a forced disconnect).
+   * So be careful setting this to true.
+   */
+  private static final boolean BATCH_CALLS = Boolean.getBoolean("gemfire.threadmonitor.batchCalls");
+
+  private static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds) {
+    return createThreadInfoMap(stuckThreadIds, SHOW_LOCKS, BATCH_CALLS);
+  }
+
   @VisibleForTesting
-  public static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds) {
-    /*
-     * NOTE: at least some implementations of getThreadInfo(long[], boolean, boolean)
-     * will core dump if the long array contains a duplicate value.
-     * That is why stuckThreadIds is a Set instead of a List.
-     */
+  public static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds,
+      final boolean showLocks, final boolean batchCalls) {
     if (stuckThreadIds.isEmpty()) {
       return Collections.emptyMap();
     }
-    long[] ids = new long[stuckThreadIds.size()];
-    int idx = 0;
-    for (long id : stuckThreadIds) {
-      ids[idx] = id;
-      idx++;
-    }
-    ThreadInfo[] threadInfos = ManagementFactory.getThreadMXBean().getThreadInfo(ids, true, true);
+    logger.info("Obtaining ThreadInfo for " + stuckThreadIds.size()

Review Comment:
   Why are we using string concatenation here rather than the typical formatted arguments style?



##########
geode-core/src/main/java/org/apache/geode/internal/monitoring/ThreadsMonitoringProcess.java:
##########
@@ -119,29 +119,69 @@ private int checkForStuckThreads(Collection<AbstractExecutor> executors, long cu
     return result;
   }
 
+  /**
+   * If set to true, then the JVM will be asked for what locks a thread holds.
+   * This is extra expensive to ask for on some JVMs so be careful setting this to true.
+   */
+  private static final boolean SHOW_LOCKS = Boolean.getBoolean("gemfire.threadmonitor.showLocks");
+  /**
+   * If set to true, then the JVM will be asked for all potential stuck threads with one call.
+   * Since getThreadInfo on many JVMs, stops ALL threads from running, and since getting info
+   * on multiple threads with one call is additional work, setting this can cause an extra long
+   * stop the world that can then cause other problems (like a forced disconnect).
+   * So be careful setting this to true.
+   */
+  private static final boolean BATCH_CALLS = Boolean.getBoolean("gemfire.threadmonitor.batchCalls");
+
+  private static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds) {
+    return createThreadInfoMap(stuckThreadIds, SHOW_LOCKS, BATCH_CALLS);
+  }
+
   @VisibleForTesting
-  public static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds) {
-    /*
-     * NOTE: at least some implementations of getThreadInfo(long[], boolean, boolean)
-     * will core dump if the long array contains a duplicate value.
-     * That is why stuckThreadIds is a Set instead of a List.
-     */
+  public static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds,
+      final boolean showLocks, final boolean batchCalls) {
     if (stuckThreadIds.isEmpty()) {
       return Collections.emptyMap();
     }
-    long[] ids = new long[stuckThreadIds.size()];
-    int idx = 0;
-    for (long id : stuckThreadIds) {
-      ids[idx] = id;
-      idx++;
-    }
-    ThreadInfo[] threadInfos = ManagementFactory.getThreadMXBean().getThreadInfo(ids, true, true);
+    logger.info("Obtaining ThreadInfo for " + stuckThreadIds.size()
+        + " threads. Configuration: showLocks=" + showLocks + " batchCalls=" + batchCalls
+        + " This is an expensive operation for the JVM and on most JVMs causes all threads to be paused.");
     Map<Long, ThreadInfo> result = new HashMap<>();
-    for (ThreadInfo threadInfo : threadInfos) {
-      if (threadInfo != null) {
-        result.put(threadInfo.getThreadId(), threadInfo);
+    if (batchCalls) {

Review Comment:
   Rather than inlining the two modes can we split this into two separate methods for readability?



##########
geode-core/src/main/java/org/apache/geode/internal/monitoring/ThreadsMonitoringProcess.java:
##########
@@ -119,29 +119,69 @@ private int checkForStuckThreads(Collection<AbstractExecutor> executors, long cu
     return result;
   }
 
+  /**
+   * If set to true, then the JVM will be asked for what locks a thread holds.
+   * This is extra expensive to ask for on some JVMs so be careful setting this to true.
+   */
+  private static final boolean SHOW_LOCKS = Boolean.getBoolean("gemfire.threadmonitor.showLocks");
+  /**
+   * If set to true, then the JVM will be asked for all potential stuck threads with one call.
+   * Since getThreadInfo on many JVMs, stops ALL threads from running, and since getting info
+   * on multiple threads with one call is additional work, setting this can cause an extra long
+   * stop the world that can then cause other problems (like a forced disconnect).
+   * So be careful setting this to true.
+   */
+  private static final boolean BATCH_CALLS = Boolean.getBoolean("gemfire.threadmonitor.batchCalls");
+
+  private static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds) {
+    return createThreadInfoMap(stuckThreadIds, SHOW_LOCKS, BATCH_CALLS);
+  }
+
   @VisibleForTesting
-  public static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds) {
-    /*
-     * NOTE: at least some implementations of getThreadInfo(long[], boolean, boolean)
-     * will core dump if the long array contains a duplicate value.
-     * That is why stuckThreadIds is a Set instead of a List.
-     */
+  public static Map<Long, ThreadInfo> createThreadInfoMap(Set<Long> stuckThreadIds,

Review Comment:
   This method is visible for testing yet there does not appear to be any tests for all the branches of this code. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org