You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Daryn Sharp (JIRA)" <ji...@apache.org> on 2014/02/13 17:13:20 UTC

[jira] [Created] (HDFS-5947) Improve dead node detection and handling

Daryn Sharp created HDFS-5947:
---------------------------------

             Summary: Improve dead node detection and handling
                 Key: HDFS-5947
                 URL: https://issues.apache.org/jira/browse/HDFS-5947
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: namenode
    Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0
            Reporter: Daryn Sharp


When {{HeartbeatManager.heartbeatCheck}} runs:
# All DNs are scanned to count dead nodes
# Processes the first dead node
# If there was a dead node, loops to re-scan all DNs again

Processing the dead node holds the namesystem write lock while removing the node from the blockmap.  It also appears to do a lot of work to immediately re-adjust the replication queues.  All this processing might be too expensive while holding the write lock, ex. if a rack or two is lost.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)