You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ricky Ho <rh...@adobe.com> on 2008/11/26 17:10:03 UTC
Highly dynamic Hadoop Cluster
Does Hadoop support the environment where nodes join and leave without a preconfigured file like "hadoop-site.xml" ? The characteristic is that none of the IP addresses and node names of any machines are stable. They will change after the machine is reboot after crash.
Before that, I use a simple way of just configuring my hadoop-site.xml and use the startup scripts that takes care of everything. But for the dynamic IP address scenario, that doesn't seem to work. Can someone suggest a solution how to deal with this scenario ?
Here are the considerations ...
Startup Discovery Scenario
===========================
How does a NameNode knows a newly joined DataNode ?
How does a new DataNode knows the existing NameNode ?
How does a JobTracker knows a newly joined TaskTracker ?
How does a new TaskTracker knows the existing JobTracker ?
Fail Recovery Scenario
=======================
Lets say a NameNode crash, and then another NameNode (at a different address) starts up. How does the new NameNode learnt about other DataNodes ?
How does other DataNodes learn about this new NameNode ?
Lets say a JobTracker crash, and then another JobTracker (at a different address) starts up. How does the new JobTracker learnt about other TaskTrackers ?
How does other TaskTrackers learn about this new JobTracker ?
Lets say a DataNode crash, and then another DataNode (at a different address) starts up. How does the new DataNode learnt about the existing NameNode ?
How does the existing NameNode learn about this new DataNode ?
Lets say a TaskTracker crash, and then another TaskTracker (at a different address) starts up. How does the new TaskTracker learnt about the existing JobTracker ?
How does the existing JobTracker learn about this new TaskTracker ?
Rgds,
Ricky
Re: Highly dynamic Hadoop Cluster
Posted by Steve Loughran <st...@apache.org>.
Ricky Ho wrote:
> Does Hadoop support the environment where nodes join and leave without a preconfigured file like "hadoop-site.xml" ? The characteristic is that none of the IP addresses and node names of any machines are stable. They will change after the machine is reboot after crash.
>
> Before that, I use a simple way of just configuring my hadoop-site.xml and use the startup scripts that takes care of everything. But for the dynamic IP address scenario, that doesn't seem to work. Can someone suggest a solution how to deal with this scenario ?
>
> Here are the considerations ...
>
> Startup Discovery Scenario
> ===========================
> How does a NameNode knows a newly joined DataNode ?
> How does a new DataNode knows the existing NameNode ?
> How does a JobTracker knows a newly joined TaskTracker ?
> How does a new TaskTracker knows the existing JobTracker ?
>
> Fail Recovery Scenario
> =======================
> Lets say a NameNode crash, and then another NameNode (at a different address) starts up. How does the new NameNode learnt about other DataNodes ?
> How does other DataNodes learn about this new NameNode ?
>
> Lets say a JobTracker crash, and then another JobTracker (at a different address) starts up. How does the new JobTracker learnt about other TaskTrackers ?
> How does other TaskTrackers learn about this new JobTracker ?
>
> Lets say a DataNode crash, and then another DataNode (at a different address) starts up. How does the new DataNode learnt about the existing NameNode ?
> How does the existing NameNode learn about this new DataNode ?
>
> Lets say a TaskTracker crash, and then another TaskTracker (at a different address) starts up. How does the new TaskTracker learnt about the existing JobTracker ?
> How does the existing JobTracker learn about this new TaskTracker ?
>
You need something to do the discovery, for them to find their settings.
- we use Anubis - http://wiki.smartfrog.org/wiki/display/sf/Anubis - it
works in places where multicast works
- Zookeeper may work here too; you should look at that
- I caught an interesting talk recently where someone on EC2 used
SimpleDB as the node registration API. EC2 doesn't support multicast Ip,
so instead every node talks to a simpledb table and registers there,
looks up its peers. You could push out a site.xml equivalent there too.
If you can use dynamic dns your life is fairly simple. Your namenode and
job tracker need to register with the DNS servers; everything else needs
to pick them up. You will need to run Hadoop with limited caching of
valid hostnames though, so that after a server restart the changed
addresses are picked up.
-steve