You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@nemo.apache.org by GitBox <gi...@apache.org> on 2021/03/31 09:06:37 UTC

[GitHub] [incubator-nemo] wonook commented on a change in pull request #307: [MINOR] Avoid yarn conflict + Save Metrics after waiting to get full data

wonook commented on a change in pull request #307:
URL: https://github.com/apache/incubator-nemo/pull/307#discussion_r604721163



##########
File path: runtime/master/src/main/java/org/apache/nemo/runtime/master/RuntimeMaster.java
##########
@@ -219,11 +224,32 @@ public void recordIRDAGMetrics(final IRDAG irdag, final String planId) {
    * Flush metrics.
    */
   public void flushMetrics() {
-    // send metric flush request to all executors
-    metricManagerMaster.sendMetricFlushRequest();
+    if (metricCountDownLatch.getCount() == 0) {
+      metricCountDownLatch = new CountDownLatch(executorRegistry.getNumberOfRunningExecutors());
+      // send metric flush request to all executors
+      metricManagerMaster.sendMetricFlushRequest();
+    }
+  }
 
+  /**
+   * Save metrics.
+   */
+  public void saveMetrics() {

Review comment:
       I think this method should be called in the NemoDriver.java file, where it originally requests to save the metrics after sending the flush request.

##########
File path: runtime/master/src/main/java/org/apache/nemo/runtime/master/RuntimeMaster.java
##########
@@ -219,11 +224,32 @@ public void recordIRDAGMetrics(final IRDAG irdag, final String planId) {
    * Flush metrics.
    */
   public void flushMetrics() {
-    // send metric flush request to all executors
-    metricManagerMaster.sendMetricFlushRequest();
+    if (metricCountDownLatch.getCount() == 0) {
+      metricCountDownLatch = new CountDownLatch(executorRegistry.getNumberOfRunningExecutors());
+      // send metric flush request to all executors
+      metricManagerMaster.sendMetricFlushRequest();
+    }
+  }
 
+  /**
+   * Save metrics.
+   */
+  public void saveMetrics() {
+    try {
+      if (!metricCountDownLatch.await(METRIC_ARRIVE_TIMEOUT, TimeUnit.MILLISECONDS)) {

Review comment:
       Have you checked if the job completion time metric does not get affected by the metric-waiting procedure?

##########
File path: runtime/master/src/main/java/org/apache/nemo/runtime/master/metric/MetricStore.java
##########
@@ -245,6 +246,8 @@ public void dumpAllMetricToFile(final String filePath) {
     try (BufferedWriter writer = new BufferedWriter(new FileWriter(filePath))) {
       final String jsonDump = dumpAllMetricToJson();
       writer.write(jsonDump);
+    } catch (final FileNotFoundException e) {
+      LOG.warn("fail to write metric to local file: {}", e);

Review comment:
       fail -> Failed

##########
File path: runtime/master/src/main/java/org/apache/nemo/runtime/master/metric/MetricStore.java
##########
@@ -245,6 +246,8 @@ public void dumpAllMetricToFile(final String filePath) {
     try (BufferedWriter writer = new BufferedWriter(new FileWriter(filePath))) {
       final String jsonDump = dumpAllMetricToJson();
       writer.write(jsonDump);
+    } catch (final FileNotFoundException e) {
+      LOG.warn("fail to write metric to local file: {}", e);

Review comment:
       or maybe "Failure while writing metric ..."

##########
File path: runtime/master/src/main/java/org/apache/nemo/runtime/master/RuntimeMaster.java
##########
@@ -274,6 +300,8 @@ public void terminate() {
       Thread.currentThread().interrupt();
     }
 
+    saveMetrics();

Review comment:
       This is also related to the NemoDriver.java comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org