You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2008/04/25 20:01:55 UTC

[jira] Commented: (HADOOP-2094) DFS should not use round robin policy in determing on which volume (file system partition) to allocate for the next block

    [ https://issues.apache.org/jira/browse/HADOOP-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592452#action_12592452 ] 

Runping Qi commented on HADOOP-2094:
------------------------------------


By analyzing disk utilization data, we have found that the four disks on each node were not evenly utlized.
It seems that first disk was the most heavily utilized, which is consistent with the potential impact of the current policy for 
volume selection for a nw block on data nodes.


> DFS should not use round robin policy in determing on which volume (file system partition)  to allocate for the next block
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2094
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2094
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Runping Qi
>            Assignee: dhruba borthakur
>         Attachments: randomDatanodePartition.patch
>
>
> When multiple file system partitions are configured for the data storage of a data node,
> it uses a strict round robin policy to decide which partition to use for writing the next block.
> This may result in anormaly cases in which the blocks of a file are not evenly distributed across 
> the partitions. For example, when we use distcp to copy files with each node have 4 mappers running concurrently, 
> those 4 mappers are writing to DFS at about the same rate. Thus, it is possible that the 4 mappers write out
> blocks interleavingly. If there are 4 file system partitions configured for the local data node, it is possible that each mapper will
> continue to write its blocks on to the same file system partition.
> A simple random placement policy will avoid such anormaly cases, and does not have any obvious drawbacks.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.