You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/08/02 15:53:01 UTC

[GitHub] [hadoop] xkrogen commented on a change in pull request #3235: HDFS-16143. De-flake TestEditLogTailer#testStandbyTriggersLogRollsWhenTailInProgressEdits

xkrogen commented on a change in pull request #3235:
URL: https://github.com/apache/hadoop/pull/3235#discussion_r681077707



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java
##########
@@ -488,4 +509,19 @@ private static MiniDFSCluster createMiniDFSCluster(Configuration conf,
         .build();
     return cluster;
   }
+
+  private static final class TestTimer extends Timer {

Review comment:
       I think we already have a class called `ManualTimer` or `FakeTimer`, can you check again?

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
##########
@@ -423,21 +430,39 @@ void triggerActiveLogRoll() {
     try {
       future = rollEditsRpcExecutor.submit(getNameNodeProxy());
       future.get(rollEditsTimeoutMs, TimeUnit.MILLISECONDS);
-      lastRollTimeMs = monotonicNow();
+      this.lastRollTimeMs = timer.monotonicNow();
       lastRollTriggerTxId = lastLoadedTxnId;
-    } catch (ExecutionException e) {
+    } catch (ExecutionException | InterruptedException e) {
       LOG.warn("Unable to trigger a roll of the active NN", e);
     } catch (TimeoutException e) {
-      if (future != null) {
-        future.cancel(true);
-      }
+      future.cancel(true);
       LOG.warn(String.format(
           "Unable to finish rolling edits in %d ms", rollEditsTimeoutMs));
-    } catch (InterruptedException e) {
-      LOG.warn("Unable to trigger a roll of the active NN", e);
     }
   }
 
+  /**
+   * This is only to be used by tests. For source code, the only way to
+   * set timer is by using EditLogTailer constructor.
+   *
+   * @param timer Timer instance provided by tests.
+   */
+  @VisibleForTesting
+  @SuppressWarnings("checkstyle:hiddenfield")

Review comment:
       can you change the name of the parameter (e.g. `newTimer`) instead of suppressing the warning?

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java
##########
@@ -394,9 +396,10 @@ public void testStandbyTriggersLogRollsWhenTailInProgressEdits()
     // Time in seconds to wait before checking if edit logs are rolled while
     // expecting no edit log roll
     final int noLogRollWaitTime = 2;
+
     // Time in seconds to wait before checking if edit logs are rolled while
-    // expecting edit log roll
-    final int logRollWaitTime = 3;
+    // expecting edit log roll.
+    final int logRollWaitTime = 7;

Review comment:
       seems we don't need to change this anymore now that we're using the fake timer?

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
##########
@@ -423,21 +423,22 @@ void triggerActiveLogRoll() {
     try {
       future = rollEditsRpcExecutor.submit(getNameNodeProxy());
       future.get(rollEditsTimeoutMs, TimeUnit.MILLISECONDS);
-      lastRollTimeMs = monotonicNow();
+      resetLastRollTimeMs();
       lastRollTriggerTxId = lastLoadedTxnId;
-    } catch (ExecutionException e) {
+    } catch (ExecutionException | InterruptedException e) {
       LOG.warn("Unable to trigger a roll of the active NN", e);
     } catch (TimeoutException e) {
-      if (future != null) {
-        future.cancel(true);
-      }
+      future.cancel(true);

Review comment:
       But also this is unrelated to this JIRA, can you put up a separate JIRA/PR for it? Particular since this JIRA is just aimed at fixing test flakiness, it's better to minimize any production code changes.

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java
##########
@@ -429,19 +432,43 @@ public void testStandbyTriggersLogRollsWhenTailInProgressEdits()
       waitForStandbyToCatchUpWithInProgressEdits(standby, activeTxId,
           standbyCatchupWaitTime);
 
+      long curTime = standby.getNamesystem().getEditLogTailer().getTimer()
+          .monotonicNow();
+      // Set TestTimer that returns (curTime + 1 sec) ms while calling
+      // timer.monotonicNow(). This returned value is used by
+      // EditLogTailer.tooLongSinceLastLoad() to determine whether
+      // EditLogTailer should roll the log. Since this test is time
+      // and resource sensitive, we are using custom Timer instance
+      // so that tooLongSinceLastLoad() always returns false
+      // until we change Timer instance in EditLogTailer.

Review comment:
       Can we simplify this comment to just:
   ```
   // move time forward by 1 second, which is not enough to trigger a log roll
   ```
   Better yet, we could save the log roll time in line 405 above:
   ```
   long logRollPeriod = standbyCatchupWaitTime + noLogRollWaitTime + 1;
   ```
   and then set the time to `curTime + TimeUnit.SECONDS.toMillis(logRollPeriod / 2)` -- I think this should be sufficiently self-explanatory to not require a comment, since you can clearly see we're only moving time by half the log-roll period.

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
##########
@@ -172,14 +171,22 @@
    */
   private final long maxTxnsPerLock;
 
-  public EditLogTailer(FSNamesystem namesystem, Configuration conf) {
+  /**
+   * Timer instance to be set only using constructor.
+   * Only tests can reassign this by using setTimerForTests().
+   * For source code, this timer instance should be treated as final.
+   */
+  private Timer timer;
+
+  public EditLogTailer(FSNamesystem namesystem, Configuration conf,
+      Timer timer) {

Review comment:
       IMO it's better to add a new constructor to avoid changes in the calling classes:
   
   ```
   public EditLogTailer(FSNamesystem namesystem, Configuration conf) {
     this(namesystem, conf, new Timer());
   }
   
   EditLogTailer(FSNamesystem namesystem, Configuration conf,
       Timer timer) {
     this.tailerThread = ...
     this.conf = conf;
     this.namesystem = namesystem;
     this.timer = timer;
     ...
   }
   ```
   
   But upon looking further, it seems that we never actually pass in a custom timer to the constructor; we only change it via `setTimer`, so I think we can get rid of the version that has a custom timer?

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java
##########
@@ -429,19 +432,43 @@ public void testStandbyTriggersLogRollsWhenTailInProgressEdits()
       waitForStandbyToCatchUpWithInProgressEdits(standby, activeTxId,
           standbyCatchupWaitTime);
 
+      long curTime = standby.getNamesystem().getEditLogTailer().getTimer()
+          .monotonicNow();
+      // Set TestTimer that returns (curTime + 1 sec) ms while calling
+      // timer.monotonicNow(). This returned value is used by
+      // EditLogTailer.tooLongSinceLastLoad() to determine whether
+      // EditLogTailer should roll the log. Since this test is time
+      // and resource sensitive, we are using custom Timer instance
+      // so that tooLongSinceLastLoad() always returns false
+      // until we change Timer instance in EditLogTailer.
+      standby.getNamesystem().getEditLogTailer().setTimerForTest(
+          new TestTimer(curTime + TimeUnit.SECONDS.toMillis(1)));

Review comment:
       Instead of changing the timer instance multiple times, can you store a reference to the timer, then adjust the time returned by the timer?
   ```
   TestTimer timer = new TestTimer(...);
   standby.getNamesystem().getEditLogTailer().setTimerForTest(timer);
   timer.setTime(...);
   ...
   timer.setTime(...);
   ```

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
##########
@@ -423,21 +423,22 @@ void triggerActiveLogRoll() {
     try {
       future = rollEditsRpcExecutor.submit(getNameNodeProxy());
       future.get(rollEditsTimeoutMs, TimeUnit.MILLISECONDS);
-      lastRollTimeMs = monotonicNow();
+      resetLastRollTimeMs();
       lastRollTriggerTxId = lastLoadedTxnId;
-    } catch (ExecutionException e) {
+    } catch (ExecutionException | InterruptedException e) {
       LOG.warn("Unable to trigger a roll of the active NN", e);
     } catch (TimeoutException e) {
-      if (future != null) {
-        future.cancel(true);
-      }
+      future.cancel(true);

Review comment:
       I see. Both `ExecutionException` and `InterruptedException` will also only be thrown by `Future#get`, so I think the try-catch should probably look like:
   ```
   Future<Void> future = rollEditsRpcExecutor.submit(getNameNodeProxy());
   try {
     future.get(....)
     ...
   } catch (...) {
     ...
   }
   ```
   This will make it more clear why `future` will not be null when we get into the `catch` block.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org