You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2008/10/21 20:01:44 UTC

[jira] Created: (HADOOP-4477) All replicas of a block end up on only 1 rack

All replicas of a block end up on only 1 rack
---------------------------------------------

                 Key: HADOOP-4477
                 URL: https://issues.apache.org/jira/browse/HADOOP-4477
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang
            Priority: Critical
             Fix For: 0.20.0


HDFS replicas placement strategy guarantees that the replicas of a block exist on at least two racks when its replication factor is greater than one. But fsck still reports that the replicas of some blocks  end up on one rack.

The cause of the problem is that decommission and corruption handling only check the block's replication factor but not the rack requirement. When an over-replicated block loses a replica due to decomission, corruption, or heartbeat lost, namenode does not take any action to guarantee that remaining replicas are on different racks.
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4477) All replicas of a block end up on only 1 rack

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642279#action_12642279 ] 

Hairong Kuang commented on HADOOP-4477:
---------------------------------------

My proposal is to include both under-replicated blocks and  blocks that do not satisfy rack requirement in the neededReplication queue. The neededReplication queue supports four priorities:
Priority 0: Blocks that have only one replicas;
Priority 1: Blocks whose replicas are on only one rack;
Priority 2: Blocks whose number of replicas is no greater than 1/3 of it replication factor;
Priority 3: All other under-replicated blocks.

In general we should have priority 4 which includes those blocks that do not belong to priorities 0-3 and do not satisfy the HDFS rack requirement. Currently HDFS provides only two-rack guarantee so priority 1 covers all rack requirement break cases.

In methods addStoredBlock, removeStoredBlock,   startDecomission,  and markBlockAsCorrupt in FSNamesystem, put both under-replication and 1 rack blocks into the neededReplication queue. Replicator will in addition replicate one more replicas for only 1 rack not under-replicated blocks.

> All replicas of a block end up on only 1 rack
> ---------------------------------------------
>
>                 Key: HADOOP-4477
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4477
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> HDFS replicas placement strategy guarantees that the replicas of a block exist on at least two racks when its replication factor is greater than one. But fsck still reports that the replicas of some blocks  end up on one rack.
> The cause of the problem is that decommission and corruption handling only check the block's replication factor but not the rack requirement. When an over-replicated block loses a replica due to decomission, corruption, or heartbeat lost, namenode does not take any action to guarantee that remaining replicas are on different racks.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.