You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Doug Balog <do...@dugos.com> on 2010/08/25 17:13:58 UTC

Rebalancing data across partitions on a datanode.

We've just added a couple of new drives to our datanodes. 
Each new drive has a single filesystem which we added to  dfs.data.dir, and mapred.{local,tmp}.dir.
Now I want to rebalance the data across the new filesystems so that they are equally utilized.
My plan is to write a script that does the following.

- Calculate how much data each filesystem should have.
- while filesystems are not balanced, 
	- Randomly pick a file and its .meta file from a filesystem that is over utilized.
	- Copy them to a tmp name on an under utilized filesystem.
	- Rename files from tmp to proper location on under utilized filesystem.
	- Remove files from the over utilized filesystem.

I think this will work because I believe that the datanode tries to open the file
on each of the filesystems until it succeeds. So it doesn't store the filesystem that 
the block lives on in memory.

Will this work ?
What are the gotcha's that I have to watch out for ?

Thanks,

Doug