You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sugandha Naolekar <su...@gmail.com> on 2009/08/18 13:10:53 UTC

Rack Awareness!

 Hello!


 I have 6 nodes and I want to configure them in racks. Below are the details
of machines::





  *Name of the machine* *IP's* *Roles Played*




 namenode 10.20.220.30 namenode
 jobsec 10.20.220.31 jobtracker and secondaryNN
 repository1 10.20.220.35 DN and TT -1  repository2 10.20.220.78 DN and TT
-2  repository3 10.20.220.71 DN and TT -3  repository4 10.20.220.74 DN and
TT -4


Now, I want to configure first three Datanodes(35,78,71) in rack 1 of
jobtracker and the 4th DN(74) in rack2 of jobtracker(jobsec here). Thus,
here jobsec is in a way datacenter,right?
Below is the python script written. please let me know, whether it is
correct or not. Also, just by setting this file's value in specified tag in
hadoop-site.xml, the script will get invoked? The machines wd automaricaly
get configured as per the topology mentioned in script???

#!/usr/bin/env python

'''
This script used by hadoop to determine network/rack topology.  It
should be specified in hadoop-site.xml via topology.script.file.name
Property.

<property>
  name>topology.script.file.name</name>
 <value>/home/hadoop/topology.py</value>
</property>
'''

import sys
from string import join




DEFAULT_RACK = '/default/rack0';

RACK_MAP =
{
    '10.20.220.35' : '/jobsec/rack1',
    '10.20.220.78' : '/jobsec/rack1',
    '10.20.220.71' : '/jobsec/rack1',

    '10.20.220.74' : '/jobsec/rack2',
}

if len(sys.argv)==1:
     print DEFAULT_RACK
else:
     print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


-- 
Regards!
Sugandha