You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/03/12 22:22:50 UTC
[jira] Updated: (HADOOP-5479) NameNode should not send empty block
replication request to DataNode
[ https://issues.apache.org/jira/browse/HADOOP-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-5479:
----------------------------------
Attachment: numTransfers.patch
This patch has three changes:
# NameNode interprets numOfTransfers as numOfBlocks to be replicated. The current code interprets it as numOfTargets to be replicated. This change is made in DatanodeDescriptor#BlockTargetPair.poll(). This prevents empty replication requests as well as empty recover requests.
# The number of targets to be chosen is not capped by the number of transfers. Again NameNode should not treat the number of transfers as the number of targets.
# The third change is not directly related to this issue. But I saw this happen when I debugged this issue. The current code moves a block to the pending replication queue only when it reaches its replication factor. This sometimes causes over-replication because it does not track all pending replications. This patch adds a block to the pending replication queue whenever there is one replication scheduled for this block.
> NameNode should not send empty block replication request to DataNode
> --------------------------------------------------------------------
>
> Key: HADOOP-5479
> URL: https://issues.apache.org/jira/browse/HADOOP-5479
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Priority: Blocker
> Fix For: 0.19.2, 0.20.0, 0.21.0
>
> Attachments: numTransfers.patch
>
>
> On our production clusters, we occasionally see that NameNode sends an empty block replication request to DataNode on every heartbeat, thus blocking this DataNode from replicating or deleting any block.
> This is partly caused by DataNode sending a wrong number of replications in progress which will be fixed by HADOOP-5465. There is also a flaw at the NameNode side. NameNode should not interpret the number of replications in progress as the number of targets since replication is done through a pipeline. It also should make sure that no empty replication request is sent to DataNode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.