You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/03/12 22:22:50 UTC

[jira] Updated: (HADOOP-5479) NameNode should not send empty block replication request to DataNode

     [ https://issues.apache.org/jira/browse/HADOOP-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5479:
----------------------------------

    Attachment: numTransfers.patch

This patch has three changes:
# NameNode interprets numOfTransfers as numOfBlocks to be replicated. The current code interprets it as numOfTargets to be replicated. This change is made in DatanodeDescriptor#BlockTargetPair.poll(). This prevents empty replication requests as well as empty recover requests.
# The number of targets to be chosen is not capped by the number of transfers. Again NameNode should not treat the number of transfers as the number of targets.
# The third change is not directly related to this issue. But I saw this happen when I debugged this issue. The current code moves a block to the pending replication queue only when it reaches its replication factor. This sometimes causes over-replication because it does not track all pending replications. This patch adds a block to the pending replication queue whenever there is one replication scheduled for this block.

> NameNode should not send empty block replication request to DataNode
> --------------------------------------------------------------------
>
>                 Key: HADOOP-5479
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5479
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0, 0.21.0
>
>         Attachments: numTransfers.patch
>
>
> On our production clusters, we occasionally see that NameNode sends an empty block replication request to DataNode on  every heartbeat, thus blocking this DataNode from replicating or deleting any block.
> This is partly caused by DataNode sending a wrong number of replications in progress which will be fixed by HADOOP-5465. There is also a flaw at the NameNode side. NameNode should not interpret the number of replications in progress as the number of targets since replication is done through a pipeline. It also should make sure that no empty replication request is sent to DataNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.