You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/07/07 19:09:45 UTC
[GitHub] [beam] pabloem commented on a change in pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…

pabloem commented on a change in pull request #12143:
URL: https://github.com/apache/beam/pull/12143#discussion_r451078998



##########
File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowOperationContext.java
##########
@@ -194,6 +200,18 @@ public DataflowExecutionState(
       this.metricsContainer = metricsContainer;
     }
 
+    public DataflowExecutionState(
+        NameContext nameContext,
+        String stateName,
+        @Nullable String requestingStepName,
+        @Nullable Integer inputIndex,
+        @Nullable MetricsContainer metricsContainer,
+        ProfileScope profileScope,
+        Clock clock) {
+      this(nameContext, stateName, requestingStepName, inputIndex, metricsContainer, profileScope);
+      this.clock = clock;
+    }

Review comment:
       Instead, add Clock to the constructor above (lines 188-200), make the clock member final, and pass Clock.SYSTEM in the constructor without it?

##########
File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowOperationContext.java
##########
@@ -264,7 +276,49 @@ public void reportLull(Thread trackedThread, long millis) {
       logRecord.setLoggerName(DataflowOperationContext.LOG.getName());
 
       // Publish directly in the context of this specific ExecutionState.
-      DataflowWorkerLoggingInitializer.getLoggingHandler().publish(this, logRecord);
+      DataflowWorkerLoggingHandler dataflowLoggingHandler =
+          DataflowWorkerLoggingInitializer.getLoggingHandler();
+      dataflowLoggingHandler.publish(this, logRecord);
+
+      if (shouldLogFullThreadDump()) {
+        Map<Thread, StackTraceElement[]> threadSet = Thread.getAllStackTraces();
+        for (Map.Entry<Thread, StackTraceElement[]> entry : threadSet.entrySet()) {
+          Thread thread = entry.getKey();
+          StackTraceElement[] stackTrace = entry.getValue();
+          StringBuilder message = new StringBuilder();
+          message.append(thread.toString()).append(":\n");
+          message.append(getStackTraceForLullMessage(stackTrace));
+          logRecord = new LogRecord(Level.INFO, message.toString());
+          logRecord.setLoggerName(DataflowOperationContext.LOG.getName());
+          dataflowLoggingHandler.publish(this, logRecord);
+        }
+      }
+    }
+
+    // A full thread dump is performed at most once every 20 minutes.
+    private static final long LOG_LULL_FULL_THREAD_DUMP_MS = 20 * 60 * 1000;
+
+    // Last time when a full thread dump was performed.
+    private long lastFullThreadDumpMillis = 0;
+
+    private boolean shouldLogFullThreadDump() {

Review comment:
       This means that a full thread dump will happen the very first time a lull is logged. Lulls are not that unusual, right? We want full thread dumps for pipelines that are truly stuck, not for pipelines that are experiencing a slow stage.
   
   Perhaps we should use the `millis` argument from `reportLull`, and only report when millis passes 20 minutes?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org