You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Kihwal Lee (JIRA)" <ji...@apache.org> on 2015/08/31 16:34:46 UTC
[jira] [Created] (HDFS-8995) Flaw in registration bookeeping can
make DN die on reconnect
Kihwal Lee created HDFS-8995:
--------------------------------
Summary: Flaw in registration bookeeping can make DN die on reconnect
Key: HDFS-8995
URL: https://issues.apache.org/jira/browse/HDFS-8995
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical
Normally data nodes re-register with the namenode when it was unreachable for more than the heartbeat expiration and becomes reachable again. Datanodes keep retrying the rpc call such as incremental block report and heartbeat and when it finally gets through the namenode tells it to re-register.
We have observed some of datanodes stay dead in such scenarios. Further investigation has revealed that those were told to shutdown by the namenode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)