You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2007/01/16 20:42:27 UTC

[jira] Commented: (HADOOP-659) Boost the priority of re-replicating blocks that are far from their replication target

    [ https://issues.apache.org/jira/browse/HADOOP-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465263 ] 

Hairong Kuang commented on HADOOP-659:
--------------------------------------

The purpose of this issue is to avoid losing data. So I will priotize blocks that have only one replica. I'd like to maintain two treesets, one of which contains all the blocks with one replica, while the other contains the rest blocks. Both treesets map blockid to a block. So both add and remove operations are O(logn). I feel that maintaing a total order is not neccessary and expensive.

> Boost the priority of re-replicating blocks that are far from their replication target
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-659
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.7.2
>            Reporter: Konstantin Shvachko
>         Assigned To: Hairong Kuang
>
> I see two types of replications that should be accelerated compared to all others.
> 1. Blocks that have only one remaining copy (but are required to have higher replication).
> 2. Blocks that have less than 1/3 of their replicas in place.
> The latter occurs when map/reduce sets replication of certain files to 10, and we want
> it happen fast to achieve better performance on the tasks.
> So I think we should distinguish two major groups of under-replicated blocks:
> first-priority (having only 1 copy or less than 1/3 of required replicas), and the rest.
> The name-node places first-priority blocks into the beginning of the neededReplication
> list, and the rest are placed at the end. That way the first-priority blocks will be replicated
> first and then the others.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira