You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2006/06/14 03:51:30 UTC

[jira] Commented: (HADOOP-296) Do not assign blocks to a datanode with < x mb free

    [ http://issues.apache.org/jira/browse/HADOOP-296?page=comments#action_12416109 ] 

Konstantin Shvachko commented on HADOOP-296:
--------------------------------------------

If you look further down in FSNamesystem.chooseTarget() there is code that selects nodes that have space 
for at least MIN_BLOCKS_FOR_WRITE (5 by default) blocks.
Then, when data nodes calculate remaining disk size (see FSDataset.getRemaining()) they use USABLE_DISK_PCT (98%) 
and the value of the member FSDataset.reserved, which is initially set to 0, and then reflects the amount of space allocated 
for the ongoing block creates. 

I think we should let individual data nodes be in control of the amount of space they need/want to preserve. 
Rather than enforcing it on the name node uniformly for all data nodes. 
This would solve your problem configuring very different machines on the cluster
with respect to their disk capacities.

So I propose to add 2 new configuration parameters for data nodes.
1) dfs.datanode.du.pct   which is just a configurable variant of USABLE_DISK_PCT.
2) dfs.datanode.du.reserved   which specifies the amount of space that should always remain on the node.
Then at startup FSDataset.reserved can be set to dfs.datanode.du.reserved rather than 0, 
and USABLE_DISK_PCT should be replaced by dfs.datanode.du.pct


> Do not assign blocks to a datanode with < x mb free
> ---------------------------------------------------
>
>          Key: HADOOP-296
>          URL: http://issues.apache.org/jira/browse/HADOOP-296
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.3.2
>     Reporter: Johan Oskarson
>  Attachments: minspace.patch
>
> We're running a smallish cluster with very different machines, some with only 60 gb harddrives
> This creates a problem when inserting files into the dfs, these machines run out of space quickly and then they cannot run any map reduce operations
> A solution would be to not assign any new blocks once the space is below a certain user configurable threshold
> This free space could then be used by the map reduce operations instead (if that's on the same disk)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira