You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "snodawn (Jira)" <ji...@apache.org> on 2020/08/27 12:57:00 UTC

[jira] [Created] (HDFS-15544) Standby namenode EditLogTailerThread shouldn't aquire a lock interruptibly when do tail edits

snodawn created HDFS-15544:
------------------------------

             Summary: Standby namenode EditLogTailerThread shouldn't aquire a lock interruptibly when do tail edits
                 Key: HDFS-15544
                 URL: https://issues.apache.org/jira/browse/HDFS-15544
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 3.3.0
            Reporter: snodawn


In my practice, active namenode sometimes holds a long time write lock in rollEditLog
{code:java}
 Longest write-lock held at 2020-08-27 12:59:30,773+0800 for 66067ms via java.lang.Thread.getStackTrace(Thread.java:1559) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:283) org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:258) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1610) org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4667) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292) org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146){code}
because standby namenode may not triggerActiveLogRoll()  as set in dfs.ha.log-roll.period after its last checkpoint, which may lead to a large size editlog for active namenode to roll.

 

When try to do tail edits, standby namenode EditLogTailerThread acquire the same lock as it do in checkpoint thread, but checkpoint thread may paste a log of time to save fsimage file (in my practice, 4 minutes) , so triggerActiveLogRoll() in EditLogTailerThread will not be called as set in dfs.ha.log-roll.period.

I propose that EditLogTailerThread shouldn't acquire a lock by using cpLockInterruptibly(), trylock() is enough.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org