You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2022/01/04 11:40:01 UTC

[GitHub] [hadoop] sodonnel commented on a change in pull request #3853: HDFS-16408. Negative LeaseRecheckIntervalMs will let LeaseMonitor loop forever and print huge amount of log

sodonnel commented on a change in pull request #3853:
URL: https://github.com/apache/hadoop/pull/3853#discussion_r778016777



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
##########
@@ -530,35 +530,37 @@ public void setLeasePeriod(long softLimit, long hardLimit) {
     /** Check leases periodically. */
     @Override
     public void run() {
-      for(; shouldRunMonitor && fsnamesystem.isRunning(); ) {
-        boolean needSync = false;
-        try {
-          // sleep now to avoid infinite loop if an exception was thrown.
-          Thread.sleep(fsnamesystem.getLeaseRecheckIntervalMs());
-
-          // pre-filter the leases w/o the fsn lock.
-          Collection<Lease> candidates = getExpiredCandidateLeases();
-          if (candidates.isEmpty()) {
-            continue;
-          }
-
-          fsnamesystem.writeLockInterruptibly();
+      try {
+        for (; shouldRunMonitor && fsnamesystem.isRunning(); ) {
+          boolean needSync = false;
           try {
-            if (!fsnamesystem.isInSafeMode()) {
-              needSync = checkLeases(candidates);
+            // sleep now to avoid infinite loop if an exception was thrown.
+            Thread.sleep(fsnamesystem.getLeaseRecheckIntervalMs());
+
+            // pre-filter the leases w/o the fsn lock.
+            Collection<Lease> candidates = getExpiredCandidateLeases();
+            if (candidates.isEmpty()) {
+              continue;
             }
-          } finally {
-            fsnamesystem.writeUnlock("leaseManager");
-            // lease reassignments should to be sync'ed.
-            if (needSync) {
-              fsnamesystem.getEditLog().logSync();
+
+            fsnamesystem.writeLockInterruptibly();
+            try {
+              if (!fsnamesystem.isInSafeMode()) {
+                needSync = checkLeases(candidates);
+              }
+            } finally {
+              fsnamesystem.writeUnlock("leaseManager");
+              // lease reassignments should to be sync'ed.
+              if (needSync) {
+                fsnamesystem.getEditLog().logSync();
+              }
             }
+          } catch (InterruptedException ie) {
+            LOG.debug("{} is interrupted", name, ie);
           }
-        } catch(InterruptedException ie) {
-          LOG.debug("{} is interrupted", name, ie);
-        } catch(Throwable e) {
-          LOG.warn("Unexpected throwable: ", e);
         }
+      } catch (Throwable e) {

Review comment:
       If we move the exception to here, will any unexpected exception cause the lease monitor thread to exit? Will anything restart it?
   
   The lease monitor is a critical service running inside the namenode - if it fails completely, then it should probably cause the Namenode to abort too.
   
   If it is having problems, then it is probably good it creates some noise in the logs. Otherwise, if the Namenode does not exit too, files can fail to get closed by the lease monitor thread much later than the error occurs, which will may be hard to trace to the lease monitor error / exiting.
   
   The example you mentioned, with a negative sleep interval - we should fix that to ensure the interval cannot get set to a value less than or equal to zero. After doing that, I am not sure if we really need to adjust the exception handler here, and allow the lease monitor to sleep and re-try as it already does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org