You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Taeho Kang <tk...@gmail.com> on 2008/03/06 03:29:26 UTC

Reblance datablocks among multiple HDD's in a datanode

Hello all,

Is there a feature / way to rebalance datablocks among multiple HDD's within
a datanode?
For example, if you added a new HDD to a datanode, then the usage levels
among the datanode's HDD would be unbalanced.

Thank you in advance,

Regards,
Taeho Kang

Re: Reblance datablocks among multiple HDD's in a datanode

Posted by Ted Dunning <td...@veoh.com>.

Just use a standard rebalancing script and the empty node will fill in
quickly enough.

The most common approach to rebalancing is to iterate through the files in
your system and increase the replication substantially for about a minute
and then drop it back down.  It helps to overlap the time period for many
files.  The idea is that the new replicas are distributed according to
available space and then one replica is randomly deleted when you decrease
the replication factor.

On 3/5/08 6:29 PM, "Taeho Kang" <tk...@gmail.com> wrote:

> Hello all,
> 
> Is there a feature / way to rebalance datablocks among multiple HDD's within
> a datanode?
> For example, if you added a new HDD to a datanode, then the usage levels
> among the datanode's HDD would be unbalanced.
> 
> Thank you in advance,
> 
> Regards,
> Taeho Kang