You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by tim robertson <ti...@gmail.com> on 2008/11/26 12:53:53 UTC

Memory allocation - please confirm

Hi,

Could you please sanity check this:

In Hadoop-site.xml I add:

<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx1G</value>
  <description>Increasing the size of the heap to allow for large in
memory index of polygons</description>
</property>

Is this all required to increase the -Xmx for processes running Maps?

(During Mapper.configure() I build a large hashtree lookup object...)

Many thanks

Tim

Re: Highly dynamic Hadoop Cluster

Posted by Steve Loughran <st...@apache.org>.

Ricky Ho wrote:
> Does Hadoop support the environment where nodes join and leave without a preconfigured file like "hadoop-site.xml" ?  The characteristic is that none of the IP addresses and node names of any machines are stable.  They will change after the machine is reboot after crash.
> 
> Before that, I use a simple way of just configuring my hadoop-site.xml and use the startup scripts that takes care of everything.  But for the dynamic IP address scenario, that doesn't seem to work.  Can someone suggest a solution how to deal with this scenario ?
> 
> Here are the considerations ...
> 
> Startup Discovery Scenario
> ===========================
> How does a NameNode knows a newly joined DataNode ?
> How does a new DataNode knows the existing NameNode ?
> How does a JobTracker knows a newly joined TaskTracker ?
> How does a new TaskTracker knows the existing JobTracker ?
> 
> Fail Recovery Scenario
> =======================
> Lets say a NameNode crash, and then another NameNode (at a different address) starts up.  How does the new NameNode learnt about other DataNodes ?
> How does other DataNodes learn about this new NameNode ?
> 
> Lets say a JobTracker crash, and then another JobTracker (at a different address) starts up.  How does the new JobTracker learnt about other TaskTrackers ?
> How does other TaskTrackers learn about this new JobTracker ?
> 
> Lets say a DataNode crash, and then another DataNode (at a different address) starts up.  How does the new DataNode learnt about the existing NameNode ?
> How does the existing NameNode learn about this new DataNode ?
> 
> Lets say a TaskTracker crash, and then another TaskTracker (at a different address) starts up.  How does the new TaskTracker learnt about the existing JobTracker ?
> How does the existing JobTracker learn about this new TaskTracker ?
> 

You need something to do the discovery, for them to find their settings.

- we use Anubis - http://wiki.smartfrog.org/wiki/display/sf/Anubis - it 
works in places where multicast works
- Zookeeper may work here too; you should look at that
- I caught an interesting talk recently where someone on EC2 used 
SimpleDB as the node registration API. EC2 doesn't support multicast Ip, 
so instead every node talks to a simpledb table and registers there, 
looks up its peers. You could push out a site.xml equivalent there too.

If you can use dynamic dns your life is fairly simple. Your namenode and 
job tracker need to register with the DNS servers; everything else needs 
to pick them up. You will need to run Hadoop with limited caching of 
valid hostnames though, so that after a server restart the changed 
addresses are picked up.

-steve

Highly dynamic Hadoop Cluster

Posted by Ricky Ho <rh...@adobe.com>.

Does Hadoop support the environment where nodes join and leave without a preconfigured file like "hadoop-site.xml" ?  The characteristic is that none of the IP addresses and node names of any machines are stable.  They will change after the machine is reboot after crash.

Before that, I use a simple way of just configuring my hadoop-site.xml and use the startup scripts that takes care of everything.  But for the dynamic IP address scenario, that doesn't seem to work.  Can someone suggest a solution how to deal with this scenario ?

Here are the considerations ...

Startup Discovery Scenario
===========================
How does a NameNode knows a newly joined DataNode ?
How does a new DataNode knows the existing NameNode ?
How does a JobTracker knows a newly joined TaskTracker ?
How does a new TaskTracker knows the existing JobTracker ?

Fail Recovery Scenario
=======================
Lets say a NameNode crash, and then another NameNode (at a different address) starts up.  How does the new NameNode learnt about other DataNodes ?
How does other DataNodes learn about this new NameNode ?

Lets say a JobTracker crash, and then another JobTracker (at a different address) starts up.  How does the new JobTracker learnt about other TaskTrackers ?
How does other TaskTrackers learn about this new JobTracker ?

Lets say a DataNode crash, and then another DataNode (at a different address) starts up.  How does the new DataNode learnt about the existing NameNode ?
How does the existing NameNode learn about this new DataNode ?

Lets say a TaskTracker crash, and then another TaskTracker (at a different address) starts up.  How does the new TaskTracker learnt about the existing JobTracker ?
How does the existing JobTracker learn about this new TaskTracker ?


Rgds,
Ricky

Re: Memory allocation - please confirm

Posted by tim robertson <ti...@gmail.com>.

Thanks!

Just making sure that this was the only parameter needing set.

Cheers

Tim


On Wed, Nov 26, 2008 at 1:20 PM, Dennis Kubes <ku...@apache.org> wrote:
> I have always seen -Xmx set in megabytes versus gigabytes.  It does work for
> me on Ubuntu as G but tt may depend on the JVM and OS, -Xmx1024M version
> -Xmx1G.  Other than that I think it looks good
>
> Dennis
>
> tim robertson wrote:
>>
>> Hi,
>>
>> Could you please sanity check this:
>>
>> In Hadoop-site.xml I add:
>>
>> <property>
>>  <name>mapred.child.java.opts</name>
>>  <value>-Xmx1G</value>
>>  <description>Increasing the size of the heap to allow for large in
>> memory index of polygons</description>
>> </property>
>>
>> Is this all required to increase the -Xmx for processes running Maps?
>>
>> (During Mapper.configure() I build a large hashtree lookup object...)
>>
>> Many thanks
>>
>> Tim
>

Re: Memory allocation - please confirm

Posted by Dennis Kubes <ku...@apache.org>.

I have always seen -Xmx set in megabytes versus gigabytes.  It does work 
for me on Ubuntu as G but tt may depend on the JVM and OS, -Xmx1024M 
version -Xmx1G.  Other than that I think it looks good

Dennis

tim robertson wrote:
> Hi,
> 
> Could you please sanity check this:
> 
> In Hadoop-site.xml I add:
> 
> <property>
>   <name>mapred.child.java.opts</name>
>   <value>-Xmx1G</value>
>   <description>Increasing the size of the heap to allow for large in
> memory index of polygons</description>
> </property>
> 
> Is this all required to increase the -Xmx for processes running Maps?
> 
> (During Mapper.configure() I build a large hashtree lookup object...)
> 
> Many thanks
> 
> Tim