You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Timothy Chklovski <ti...@isi.edu> on 2007/04/05 23:58:12 UTC

Dynamic addition and removal of Hadoop nodes

 Hello,

We have been experimenting with Hadoop on a largish, but shared cluster.
That means we can allocate various nodes, but would also like to let others
use nodes
(so not having a node permanently is a bit like the situation on EC2).
We are interested in whether other users have developed approaches to get
machines to join (and leave) both the DFS and Tasktracker pools.

It does not seem very complicated, but we are wondering if the brute-force
approach ignores some arcana about such issues as, eg, whether refreshes
should be called on the namenode and the jobtracker.

Also, if we know a node will leave the pool, is there something that we can
tell
the namenode and the jobtracker in advance to make the leaving less
disruptive
(eg, stop accepting new large jobs, or even go into safe mode?)

-> If people have developed approaches to automating how machines join
and leave pools, we'd love to know.
-> Furthermore, if it makes sense, please consider it a feature request that
this
be automated/wrapped in scripts that can come with a Hadoop distribution
(or,
if everything already works, then extending the documentation on how one
might
accomplish this correctly).

Thanks much for Hadoop & continued work on it!

-- Tim

-- 
Timothy Chklovski
Senior Research Scientist
USC Information Sciences Institute
timc@isi.edu
310.448.8763

Re: Dynamic addition and removal of Hadoop nodes

Posted by Doug Cutting <cu...@apache.org>.

Timothy Chklovski wrote:
> We are interested in whether other users have developed approaches to get
> machines to join (and leave) both the DFS and Tasktracker pools.
> 
> It does not seem very complicated, but we are wondering if the brute-force
> approach ignores some arcana about such issues as, eg, whether refreshes
> should be called on the namenode and the jobtracker.

Brute force should be effective for tasktrackers, but one should be more 
careful with datanodes to avoid data loss.

To decomission datanodes, use the dfs.hosts and dfs.hosts.exclude 
configuration parameters.  Remove nodes to be decomissioned using these 
files, then use 'bin/hadoop dfsadmin -refreshNodes' to cause the 
namenode to re-read these files.  Finally, wait until the 'dfsadmin 
-report' reports that the requested nodes are decomissioned before 
killing their datanode processes.

Doug