You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sugandha Naolekar <su...@gmail.com> on 2009/08/18 13:10:53 UTC
Rack Awareness!
Hello!
I have 6 nodes and I want to configure them in racks. Below are the details
of machines::
*Name of the machine* *IP's* *Roles Played*
namenode 10.20.220.30 namenode
jobsec 10.20.220.31 jobtracker and secondaryNN
repository1 10.20.220.35 DN and TT -1 repository2 10.20.220.78 DN and TT
-2 repository3 10.20.220.71 DN and TT -3 repository4 10.20.220.74 DN and
TT -4
Now, I want to configure first three Datanodes(35,78,71) in rack 1 of
jobtracker and the 4th DN(74) in rack2 of jobtracker(jobsec here). Thus,
here jobsec is in a way datacenter,right?
Below is the python script written. please let me know, whether it is
correct or not. Also, just by setting this file's value in specified tag in
hadoop-site.xml, the script will get invoked? The machines wd automaricaly
get configured as per the topology mentioned in script???
#!/usr/bin/env python
'''
This script used by hadoop to determine network/rack topology. It
should be specified in hadoop-site.xml via topology.script.file.name
Property.
<property>
name>topology.script.file.name</name>
<value>/home/hadoop/topology.py</value>
</property>
'''
import sys
from string import join
DEFAULT_RACK = '/default/rack0';
RACK_MAP =
{
'10.20.220.35' : '/jobsec/rack1',
'10.20.220.78' : '/jobsec/rack1',
'10.20.220.71' : '/jobsec/rack1',
'10.20.220.74' : '/jobsec/rack2',
}
if len(sys.argv)==1:
print DEFAULT_RACK
else:
print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")
--
Regards!
Sugandha