You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "lohit vijayarenu (JIRA)" <ji...@apache.org> on 2008/04/17 07:11:21 UTC

[jira] Updated: (HADOOP-2559) DFS should place one replica per rack

     [ https://issues.apache.org/jira/browse/HADOOP-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-2559:
-------------------------------------

     Description: 
Currently, when writing out a block, dfs will place one copy to a local data node, one copy to a rack local node
and another one to a remote node. This leads to a number of undesired properties:

1. The block will be rack-local to two tacks instead of three, reducing the advantage of rack locality based scheduling by 1/3.

2. The Blocks of a file (especiallya  large file) are unevenly distributed over the nodes: One third will be on the local node, and two thirds on the nodes on the same rack. This may make some nodes full much faster than others, 
increasing the need of rebalancing. Furthermore, this also make some nodes become "hot spots" if those big 
files are popular and accessed by many applications.




  was:

Currently, when writing out a block, dfs will place one copy to a local data node, one copy to a rack local node
and another one to a remote node. This leads to a number of undesired properties:

1. The block will be rack-local to two tacks instead of three, reducing the advantage of rack locality based scheduling by 1/3.

2. The Blocks of a file (especiallya  large file) are unevenly distributed over the nodes: One third will be on the local node, and two thirds on the nodes on the same rack. This may make some nodes full much faster than others, 
increasing the need of rebalancing. Furthermore, this also make some nodes become "hot spots" if those big 
files are popular and accessed by many applications.




    Release Note: Change DFS block placement to allocate the first replica locally, the second off-rack, and the third intra-rack from the second.

> DFS should place one replica per rack
> -------------------------------------
>
>                 Key: HADOOP-2559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2559
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Runping Qi
>            Assignee: lohit vijayarenu
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2559-1-2.patch, HADOOP-2559-1-3.patch, HADOOP-2559-1-4.patch, HADOOP-2559-1.patch, HADOOP-2559-1.patch, HADOOP-2559-2.patch, Patch1_Block_Report.png.jpg, Patch1_Rack_Node_Mapping.jpg, Patch2 Block Report.jpg, Patch2_Rack_Node_Mapping.jpg, Trunk_Block_Report.png, Trunk_Rack_Node_Mapping.jpg
>
>
> Currently, when writing out a block, dfs will place one copy to a local data node, one copy to a rack local node
> and another one to a remote node. This leads to a number of undesired properties:
> 1. The block will be rack-local to two tacks instead of three, reducing the advantage of rack locality based scheduling by 1/3.
> 2. The Blocks of a file (especiallya  large file) are unevenly distributed over the nodes: One third will be on the local node, and two thirds on the nodes on the same rack. This may make some nodes full much faster than others, 
> increasing the need of rebalancing. Furthermore, this also make some nodes become "hot spots" if those big 
> files are popular and accessed by many applications.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.