You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Hargraves, Alyssa" <al...@WPI.EDU> on 2009/01/12 20:52:09 UTC

Dynamic Node Removal and Addition

Hello everyone,

I have a question and was hoping some on the mailinglist could offer some pointers. I'm working on a project with another student and for part of this project we are trying to create something that will allow nodes to be added and removed from the hadoop cluster at will.  The goal is to have the nodes run a program that gives the user the freedom to add or remove themselves from the cluster to take advantage of a workstation when the user leaves (or if they'd like it running anyway when they're at the PC).  This would be on Windows computers of various different OSes.

>From what we can find, hadoop does not already support this feature, but it does seem to support dynamically adding nodes and removing nodes in other ways.  For example, to add a node, one would have to make sure hadoop is set up on the PC along with cygwin, Java, and ssh, but after that initial setup it's just a matter of adding the PC to the conf/slaves file, making sure the node is not listed in the exclude file, and running the start datanode and start tasktracker commands from the node you are adding (basically described in FAQ item 25).  To remove a node, it seems to be just a matter of adding it to dfs.hosts.exclude and refreshing the list of nodes (described in hadoop FAQ 17).

Our question is whether or not a simple interface for this already exists, and whether or not anyone sees any potential flaws with how we are planning to accomplish these tasks.  From our research we were not able to find anything that already exists for this purpose, but we find it surprising that an interface for this would not already exist.  We welcome any comments, recommendations, and insights anyone might have for accomplishing this task.

Thank you,
Alyssa Hargraves
Patrick Crane
WPI Class of 2009

Re: Dynamic Node Removal and Addition

Posted by Rasit OZDAS <ra...@gmail.com>.
Hi Alyssa,

http://markmail.org/message/jyo4wssouzlb4olm#query:%22Decommission%20of%20datanodes%22+page:1+mid:p2krkt6ebysrsrpl+state:results
as pointed here, decommission (removal) of datanodes was not an easy job at
the date of version 0.12.
I strongly think it's still not easy.
As far as I know, one node should be used both as datanode and tasktracker.
So, performance loss will be possibly far greater than performance gain of
your design.
My solution would be using them still as datanodes, and changing TaskTracker
code a little bit, so that they won't be used for jobs. Code manipulation
here should be easy, as I assume.

Hope this helps,
Rasit

2009/1/12 Hargraves, Alyssa <al...@wpi.edu>:
> Hello everyone,
>
> I have a question and was hoping some on the mailinglist could offer some
pointers. I'm working on a project with another student and for part of this
project we are trying to create something that will allow nodes to be added
and removed from the hadoop cluster at will.  The goal is to have the nodes
run a program that gives the user the freedom to add or remove themselves
from the cluster to take advantage of a workstation when the user leaves (or
if they'd like it running anyway when they're at the PC).  This would be on
Windows computers of various different OSes.
>
> From what we can find, hadoop does not already support this feature, but
it does seem to support dynamically adding nodes and removing nodes in other
ways.  For example, to add a node, one would have to make sure hadoop is set
up on the PC along with cygwin, Java, and ssh, but after that initial setup
it's just a matter of adding the PC to the conf/slaves file, making sure the
node is not listed in the exclude file, and running the start datanode and
start tasktracker commands from the node you are adding (basically described
in FAQ item 25).  To remove a node, it seems to be just a matter of adding
it to dfs.hosts.exclude and refreshing the list of nodes (described in
hadoop FAQ 17).
>
> Our question is whether or not a simple interface for this already exists,
and whether or not anyone sees any potential flaws with how we are planning
to accomplish these tasks.  From our research we were not able to find
anything that already exists for this purpose, but we find it surprising
that an interface for this would not already exist.  We welcome any
comments, recommendations, and insights anyone might have for accomplishing
this task.
>
> Thank you,
> Alyssa Hargraves
> Patrick Crane
> WPI Class of 2009



-- 
M. Raşit ÖZDAŞ