You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Abdul Navaz <na...@gmail.com> on 2014/11/17 00:18:27 UTC

Configure Rack Numbers

Hello,
I have hadoop cluster with 9 nodes. All belongs to /default racks. But I
want the setup something similar to this.
(All are in same subnets)
 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
I am trying to check the Hadoop rack awareness and how it copies the single
block of data in one rack and replicas in some other rack. I want to analyse
some network performance from this.
So how can we separate this DNs based on rack numbers. Where can we
configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz

Re: Configure Rack Numbers

Posted by Serge Blazhievsky <ha...@gmail.com>.

The effective technique to fix block distribution after changes in rack awareness is to increase replication factor and decrease it back

Regards,
Serge

> On Nov 16, 2014, at 21:10, Brahma Reddy Battula <br...@huawei.com> wrote:
> 
> Hi Navaz,
> 
> you have to configure the following two properties in namenode(after that you need to restart the namenode).
> 
>  <property>
>   <name>topology.node.switch.mapping.impl</name>
>   <value>org.apache.hadoop.net.ScriptBasedMapping</value>
>   <description> The default implementation of the DNSToSwitchMapping. It
>     invokes a script specified in topology.script.file.name to resolve
>     node names. If the value for topology.script.file.name is not set, the
>     default value of DEFAULT_RACK is returned for all node names.
>   </description>
> </property>
> 
> <property>
>   <name>topology.script.file.name</name>
>   <value>/path/to/topo.sh</value>
>   <description> The script name that should be invoked to resolve DNS names to
>     NetworkTopology names. Example: the script would take host.foo.bar as an
>     argument, and return /rack1 as the output.
>   </description>
> </property>
> 
> 
> Example script file.
> 
> 
> topo.sh
> =======
> 
> #!/bin/bash
> 
> python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"
> 
> 
> topology.py 
> ===========
>  import sys 
> from string import join 
> 
> DEFAULT_RACK = '/default/rack0'; 
> 
> RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0', 
>              '1.2.3.4' : '/datacenter1/rack1', 
>              '1.2.3.5' : '/datacenter1/rack1', 
>              '1.2.3.6' : '/datacenter1/rack1', 
> 
>              '10.2.3.4' : '/datacenter1/rack2', 
>              '10.2.3.4' : '/datacenter1/rack2' 
>     } 
> 
> if len(sys.argv)==1: 
>     print DEFAULT_RACK 
> else: 
>     print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ") 
> 
> 
> Please check the following link for more details.
> 
> 
> https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
> 
> 
> 
> Thanks & Regards
> 
>  Brahma Reddy Battula
> 
>  
> 
> HUAWEI TECHNOLOGIES INDIA PVT.LTD.  
> Ground,1&2 floors,Solitaire,  
> 139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur  
> Bangalore - 560 071 , India  
> Tel : +91- 80- 3980 9600  Ext No: 4905 
>  Fax : +91-80-41118578 
> 
> From: Abdul Navaz [navaz.enc@gmail.com]
> Sent: Monday, November 17, 2014 4:48 AM
> To: user@hadoop.apache.org
> Subject: Configure Rack Numbers
> 
> Hello,
> 
> I have hadoop cluster with 9 nodes. All belongs to /default racks. But I want the setup something similar to this.
> 
> (All are in same subnets)
> 
>  Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
>  Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
>  Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
> I am trying to check the Hadoop rack awareness and how it copies the single block of data in one rack and replicas in some other rack. I want to analyse some network performance from this.
> 
> So how can we separate this DNs based on rack numbers. Where can we configure this rack numbers and say this DN belongs to this rack number.
> 
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
>

Re: Configure Rack Numbers

Posted by Abdul Navaz <na...@gmail.com>.

Hello,

Thank you very much for this document.  How can I see each data nodes
belongs to which rack ?

I ran the below command  on namenode and it throws an error.

bin/hadoop dfsadmin -printTopology
Warning: $HADOOP_HOME is deprecated.



printTopology: Unknown command


Is there any other command to check ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Brahma Reddy Battula <br...@huawei.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Sunday, November 16, 2014 at 11:10 PM
To:  "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject:  RE: Configure Rack Numbers

Hi Navaz,

you have to configure the following two properties in namenode(after that
you need to restart the namenode).

<property>
  <name>topology.node.switch.mapping.impl</name>
  <value>org.apache.hadoop.net.ScriptBasedMapping</value>
  <description> The default implementation of the DNSToSwitchMapping. It
    invokes a script specified in topology.script.file.name to resolve
    node names. If the value for topology.script.file.name is not set, the
    default value of DEFAULT_RACK is returned for all node names.
  </description></property><property>
  <name>topology.script.file.name</name>
  <value>/path/to/topo.sh</value>
  <description> The script name that should be invoked to resolve DNS names
to
    NetworkTopology names. Example: the script would take host.foo.bar as an
    argument, and return /rack1 as the output.
  </description></property>


Example script file.


topo.sh
=======

#!/bin/bash

python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"


topology.py 
===========
import sys 
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack1',
             '1.2.3.5' : '/datacenter1/rack1',
             '1.2.3.6' : '/datacenter1/rack1',

             '10.2.3.4' : '/datacenter1/rack2',
             '10.2.3.4' : '/datacenter1/rack2'
    } 

if len(sys.argv)==1:
    print DEFAULT_RACK
else: 
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


Please check the following link for more details.


https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr
oposal.pdf



Thanks & Regards

 Brahma Reddy Battula

 

HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
 Fax : +91-80-41118578


From: Abdul Navaz [navaz.enc@gmail.com]
Sent: Monday, November 17, 2014 4:48 AM
To: user@hadoop.apache.org
Subject: Configure Rack Numbers

Hello,
I have hadoop cluster with 9 nodes. All belongs to /default racks. But I
want the setup something similar to this.
(All are in same subnets)
 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
I am trying to check the Hadoop rack awareness and how it copies the single
block of data in one rack and replicas in some other rack. I want to analyse
some network performance from this.
So how can we separate this DNs based on rack numbers. Where can we
configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz

Re: Configure Rack Numbers

Posted by Abdul Navaz <na...@gmail.com>.

Hello,

Thank you very much for this document.  How can I see each data nodes
belongs to which rack ?

I ran the below command  on namenode and it throws an error.

bin/hadoop dfsadmin -printTopology
Warning: $HADOOP_HOME is deprecated.



printTopology: Unknown command


Is there any other command to check ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Brahma Reddy Battula <br...@huawei.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Sunday, November 16, 2014 at 11:10 PM
To:  "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject:  RE: Configure Rack Numbers

Hi Navaz,

you have to configure the following two properties in namenode(after that
you need to restart the namenode).

<property>
  <name>topology.node.switch.mapping.impl</name>
  <value>org.apache.hadoop.net.ScriptBasedMapping</value>
  <description> The default implementation of the DNSToSwitchMapping. It
    invokes a script specified in topology.script.file.name to resolve
    node names. If the value for topology.script.file.name is not set, the
    default value of DEFAULT_RACK is returned for all node names.
  </description></property><property>
  <name>topology.script.file.name</name>
  <value>/path/to/topo.sh</value>
  <description> The script name that should be invoked to resolve DNS names
to
    NetworkTopology names. Example: the script would take host.foo.bar as an
    argument, and return /rack1 as the output.
  </description></property>


Example script file.


topo.sh
=======

#!/bin/bash

python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"


topology.py 
===========
import sys 
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack1',
             '1.2.3.5' : '/datacenter1/rack1',
             '1.2.3.6' : '/datacenter1/rack1',

             '10.2.3.4' : '/datacenter1/rack2',
             '10.2.3.4' : '/datacenter1/rack2'
    } 

if len(sys.argv)==1:
    print DEFAULT_RACK
else: 
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


Please check the following link for more details.


https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr
oposal.pdf



Thanks & Regards

 Brahma Reddy Battula

 

HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
 Fax : +91-80-41118578


From: Abdul Navaz [navaz.enc@gmail.com]
Sent: Monday, November 17, 2014 4:48 AM
To: user@hadoop.apache.org
Subject: Configure Rack Numbers

Hello,
I have hadoop cluster with 9 nodes. All belongs to /default racks. But I
want the setup something similar to this.
(All are in same subnets)
 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
I am trying to check the Hadoop rack awareness and how it copies the single
block of data in one rack and replicas in some other rack. I want to analyse
some network performance from this.
So how can we separate this DNs based on rack numbers. Where can we
configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz

Re: Configure Rack Numbers

Posted by Serge Blazhievsky <ha...@gmail.com>.

The effective technique to fix block distribution after changes in rack awareness is to increase replication factor and decrease it back

Regards,
Serge

> On Nov 16, 2014, at 21:10, Brahma Reddy Battula <br...@huawei.com> wrote:
> 
> Hi Navaz,
> 
> you have to configure the following two properties in namenode(after that you need to restart the namenode).
> 
>  <property>
>   <name>topology.node.switch.mapping.impl</name>
>   <value>org.apache.hadoop.net.ScriptBasedMapping</value>
>   <description> The default implementation of the DNSToSwitchMapping. It
>     invokes a script specified in topology.script.file.name to resolve
>     node names. If the value for topology.script.file.name is not set, the
>     default value of DEFAULT_RACK is returned for all node names.
>   </description>
> </property>
> 
> <property>
>   <name>topology.script.file.name</name>
>   <value>/path/to/topo.sh</value>
>   <description> The script name that should be invoked to resolve DNS names to
>     NetworkTopology names. Example: the script would take host.foo.bar as an
>     argument, and return /rack1 as the output.
>   </description>
> </property>
> 
> 
> Example script file.
> 
> 
> topo.sh
> =======
> 
> #!/bin/bash
> 
> python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"
> 
> 
> topology.py 
> ===========
>  import sys 
> from string import join 
> 
> DEFAULT_RACK = '/default/rack0'; 
> 
> RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0', 
>              '1.2.3.4' : '/datacenter1/rack1', 
>              '1.2.3.5' : '/datacenter1/rack1', 
>              '1.2.3.6' : '/datacenter1/rack1', 
> 
>              '10.2.3.4' : '/datacenter1/rack2', 
>              '10.2.3.4' : '/datacenter1/rack2' 
>     } 
> 
> if len(sys.argv)==1: 
>     print DEFAULT_RACK 
> else: 
>     print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ") 
> 
> 
> Please check the following link for more details.
> 
> 
> https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
> 
> 
> 
> Thanks & Regards
> 
>  Brahma Reddy Battula
> 
>  
> 
> HUAWEI TECHNOLOGIES INDIA PVT.LTD.  
> Ground,1&2 floors,Solitaire,  
> 139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur  
> Bangalore - 560 071 , India  
> Tel : +91- 80- 3980 9600  Ext No: 4905 
>  Fax : +91-80-41118578 
> 
> From: Abdul Navaz [navaz.enc@gmail.com]
> Sent: Monday, November 17, 2014 4:48 AM
> To: user@hadoop.apache.org
> Subject: Configure Rack Numbers
> 
> Hello,
> 
> I have hadoop cluster with 9 nodes. All belongs to /default racks. But I want the setup something similar to this.
> 
> (All are in same subnets)
> 
>  Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
>  Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
>  Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
> I am trying to check the Hadoop rack awareness and how it copies the single block of data in one rack and replicas in some other rack. I want to analyse some network performance from this.
> 
> So how can we separate this DNs based on rack numbers. Where can we configure this rack numbers and say this DN belongs to this rack number.
> 
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
>

Re: Configure Rack Numbers

Posted by Serge Blazhievsky <ha...@gmail.com>.

The effective technique to fix block distribution after changes in rack awareness is to increase replication factor and decrease it back

Regards,
Serge

> On Nov 16, 2014, at 21:10, Brahma Reddy Battula <br...@huawei.com> wrote:
> 
> Hi Navaz,
> 
> you have to configure the following two properties in namenode(after that you need to restart the namenode).
> 
>  <property>
>   <name>topology.node.switch.mapping.impl</name>
>   <value>org.apache.hadoop.net.ScriptBasedMapping</value>
>   <description> The default implementation of the DNSToSwitchMapping. It
>     invokes a script specified in topology.script.file.name to resolve
>     node names. If the value for topology.script.file.name is not set, the
>     default value of DEFAULT_RACK is returned for all node names.
>   </description>
> </property>
> 
> <property>
>   <name>topology.script.file.name</name>
>   <value>/path/to/topo.sh</value>
>   <description> The script name that should be invoked to resolve DNS names to
>     NetworkTopology names. Example: the script would take host.foo.bar as an
>     argument, and return /rack1 as the output.
>   </description>
> </property>
> 
> 
> Example script file.
> 
> 
> topo.sh
> =======
> 
> #!/bin/bash
> 
> python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"
> 
> 
> topology.py 
> ===========
>  import sys 
> from string import join 
> 
> DEFAULT_RACK = '/default/rack0'; 
> 
> RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0', 
>              '1.2.3.4' : '/datacenter1/rack1', 
>              '1.2.3.5' : '/datacenter1/rack1', 
>              '1.2.3.6' : '/datacenter1/rack1', 
> 
>              '10.2.3.4' : '/datacenter1/rack2', 
>              '10.2.3.4' : '/datacenter1/rack2' 
>     } 
> 
> if len(sys.argv)==1: 
>     print DEFAULT_RACK 
> else: 
>     print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ") 
> 
> 
> Please check the following link for more details.
> 
> 
> https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
> 
> 
> 
> Thanks & Regards
> 
>  Brahma Reddy Battula
> 
>  
> 
> HUAWEI TECHNOLOGIES INDIA PVT.LTD.  
> Ground,1&2 floors,Solitaire,  
> 139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur  
> Bangalore - 560 071 , India  
> Tel : +91- 80- 3980 9600  Ext No: 4905 
>  Fax : +91-80-41118578 
> 
> From: Abdul Navaz [navaz.enc@gmail.com]
> Sent: Monday, November 17, 2014 4:48 AM
> To: user@hadoop.apache.org
> Subject: Configure Rack Numbers
> 
> Hello,
> 
> I have hadoop cluster with 9 nodes. All belongs to /default racks. But I want the setup something similar to this.
> 
> (All are in same subnets)
> 
>  Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
>  Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
>  Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
> I am trying to check the Hadoop rack awareness and how it copies the single block of data in one rack and replicas in some other rack. I want to analyse some network performance from this.
> 
> So how can we separate this DNs based on rack numbers. Where can we configure this rack numbers and say this DN belongs to this rack number.
> 
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
>

Re: Configure Rack Numbers

Posted by Abdul Navaz <na...@gmail.com>.

Hello,

Thank you very much for this document.  How can I see each data nodes
belongs to which rack ?

I ran the below command  on namenode and it throws an error.

bin/hadoop dfsadmin -printTopology
Warning: $HADOOP_HOME is deprecated.



printTopology: Unknown command


Is there any other command to check ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Brahma Reddy Battula <br...@huawei.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Sunday, November 16, 2014 at 11:10 PM
To:  "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject:  RE: Configure Rack Numbers

Hi Navaz,

you have to configure the following two properties in namenode(after that
you need to restart the namenode).

<property>
  <name>topology.node.switch.mapping.impl</name>
  <value>org.apache.hadoop.net.ScriptBasedMapping</value>
  <description> The default implementation of the DNSToSwitchMapping. It
    invokes a script specified in topology.script.file.name to resolve
    node names. If the value for topology.script.file.name is not set, the
    default value of DEFAULT_RACK is returned for all node names.
  </description></property><property>
  <name>topology.script.file.name</name>
  <value>/path/to/topo.sh</value>
  <description> The script name that should be invoked to resolve DNS names
to
    NetworkTopology names. Example: the script would take host.foo.bar as an
    argument, and return /rack1 as the output.
  </description></property>


Example script file.


topo.sh
=======

#!/bin/bash

python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"


topology.py 
===========
import sys 
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack1',
             '1.2.3.5' : '/datacenter1/rack1',
             '1.2.3.6' : '/datacenter1/rack1',

             '10.2.3.4' : '/datacenter1/rack2',
             '10.2.3.4' : '/datacenter1/rack2'
    } 

if len(sys.argv)==1:
    print DEFAULT_RACK
else: 
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


Please check the following link for more details.


https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr
oposal.pdf



Thanks & Regards

 Brahma Reddy Battula

 

HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
 Fax : +91-80-41118578


From: Abdul Navaz [navaz.enc@gmail.com]
Sent: Monday, November 17, 2014 4:48 AM
To: user@hadoop.apache.org
Subject: Configure Rack Numbers

Hello,
I have hadoop cluster with 9 nodes. All belongs to /default racks. But I
want the setup something similar to this.
(All are in same subnets)
 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
I am trying to check the Hadoop rack awareness and how it copies the single
block of data in one rack and replicas in some other rack. I want to analyse
some network performance from this.
So how can we separate this DNs based on rack numbers. Where can we
configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz

Re: Configure Rack Numbers

Posted by Abdul Navaz <na...@gmail.com>.

Hello,

Thank you very much for this document.  How can I see each data nodes
belongs to which rack ?

I ran the below command  on namenode and it throws an error.

bin/hadoop dfsadmin -printTopology
Warning: $HADOOP_HOME is deprecated.



printTopology: Unknown command


Is there any other command to check ?

Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX
Ph: 281-685-0388


From:  Brahma Reddy Battula <br...@huawei.com>
Reply-To:  <us...@hadoop.apache.org>
Date:  Sunday, November 16, 2014 at 11:10 PM
To:  "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject:  RE: Configure Rack Numbers

Hi Navaz,

you have to configure the following two properties in namenode(after that
you need to restart the namenode).

<property>
  <name>topology.node.switch.mapping.impl</name>
  <value>org.apache.hadoop.net.ScriptBasedMapping</value>
  <description> The default implementation of the DNSToSwitchMapping. It
    invokes a script specified in topology.script.file.name to resolve
    node names. If the value for topology.script.file.name is not set, the
    default value of DEFAULT_RACK is returned for all node names.
  </description></property><property>
  <name>topology.script.file.name</name>
  <value>/path/to/topo.sh</value>
  <description> The script name that should be invoked to resolve DNS names
to
    NetworkTopology names. Example: the script would take host.foo.bar as an
    argument, and return /rack1 as the output.
  </description></property>


Example script file.


topo.sh
=======

#!/bin/bash

python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"


topology.py 
===========
import sys 
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack1',
             '1.2.3.5' : '/datacenter1/rack1',
             '1.2.3.6' : '/datacenter1/rack1',

             '10.2.3.4' : '/datacenter1/rack2',
             '10.2.3.4' : '/datacenter1/rack2'
    } 

if len(sys.argv)==1:
    print DEFAULT_RACK
else: 
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


Please check the following link for more details.


https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr
oposal.pdf



Thanks & Regards

 Brahma Reddy Battula

 

HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
 Fax : +91-80-41118578


From: Abdul Navaz [navaz.enc@gmail.com]
Sent: Monday, November 17, 2014 4:48 AM
To: user@hadoop.apache.org
Subject: Configure Rack Numbers

Hello,
I have hadoop cluster with 9 nodes. All belongs to /default racks. But I
want the setup something similar to this.
(All are in same subnets)
 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
I am trying to check the Hadoop rack awareness and how it copies the single
block of data in one rack and replicas in some other rack. I want to analyse
some network performance from this.
So how can we separate this DNs based on rack numbers. Where can we
configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz

Re: Configure Rack Numbers

Posted by Serge Blazhievsky <ha...@gmail.com>.

The effective technique to fix block distribution after changes in rack awareness is to increase replication factor and decrease it back

Regards,
Serge

> On Nov 16, 2014, at 21:10, Brahma Reddy Battula <br...@huawei.com> wrote:
> 
> Hi Navaz,
> 
> you have to configure the following two properties in namenode(after that you need to restart the namenode).
> 
>  <property>
>   <name>topology.node.switch.mapping.impl</name>
>   <value>org.apache.hadoop.net.ScriptBasedMapping</value>
>   <description> The default implementation of the DNSToSwitchMapping. It
>     invokes a script specified in topology.script.file.name to resolve
>     node names. If the value for topology.script.file.name is not set, the
>     default value of DEFAULT_RACK is returned for all node names.
>   </description>
> </property>
> 
> <property>
>   <name>topology.script.file.name</name>
>   <value>/path/to/topo.sh</value>
>   <description> The script name that should be invoked to resolve DNS names to
>     NetworkTopology names. Example: the script would take host.foo.bar as an
>     argument, and return /rack1 as the output.
>   </description>
> </property>
> 
> 
> Example script file.
> 
> 
> topo.sh
> =======
> 
> #!/bin/bash
> 
> python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"
> 
> 
> topology.py 
> ===========
>  import sys 
> from string import join 
> 
> DEFAULT_RACK = '/default/rack0'; 
> 
> RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0', 
>              '1.2.3.4' : '/datacenter1/rack1', 
>              '1.2.3.5' : '/datacenter1/rack1', 
>              '1.2.3.6' : '/datacenter1/rack1', 
> 
>              '10.2.3.4' : '/datacenter1/rack2', 
>              '10.2.3.4' : '/datacenter1/rack2' 
>     } 
> 
> if len(sys.argv)==1: 
>     print DEFAULT_RACK 
> else: 
>     print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ") 
> 
> 
> Please check the following link for more details.
> 
> 
> https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
> 
> 
> 
> Thanks & Regards
> 
>  Brahma Reddy Battula
> 
>  
> 
> HUAWEI TECHNOLOGIES INDIA PVT.LTD.  
> Ground,1&2 floors,Solitaire,  
> 139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur  
> Bangalore - 560 071 , India  
> Tel : +91- 80- 3980 9600  Ext No: 4905 
>  Fax : +91-80-41118578 
> 
> From: Abdul Navaz [navaz.enc@gmail.com]
> Sent: Monday, November 17, 2014 4:48 AM
> To: user@hadoop.apache.org
> Subject: Configure Rack Numbers
> 
> Hello,
> 
> I have hadoop cluster with 9 nodes. All belongs to /default racks. But I want the setup something similar to this.
> 
> (All are in same subnets)
> 
>  Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
>  Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
>  Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.
> I am trying to check the Hadoop rack awareness and how it copies the single block of data in one rack and replicas in some other rack. I want to analyse some network performance from this.
> 
> So how can we separate this DNs based on rack numbers. Where can we configure this rack numbers and say this DN belongs to this rack number.
> 
> 
> 
> Thanks & Regards,
> 
> Abdul Navaz
>

RE: Configure Rack Numbers

Posted by Brahma Reddy Battula <br...@huawei.com>.

Hi Navaz,

you have to configure the following two properties in namenode(after that you need to restart the namenode).


<property>
  <name>topology.node.switch.mapping.impl</name>
  <value>org.apache.hadoop.net.ScriptBasedMapping</value>
  <description> The default implementation of the DNSToSwitchMapping. It
    invokes a script specified in topology.script.file.name to resolve
    node names. If the value for topology.script.file.name is not set, the
    default value of DEFAULT_RACK is returned for all node names.
  </description>
</property>

<property>
  <name>topology.script.file.name</name>
  <value>/path/to/topo.sh</value>
  <description> The script name that should be invoked to resolve DNS names to
    NetworkTopology names. Example: the script would take host.foo.bar as an
    argument, and return /rack1 as the output.
  </description>
</property>


Example script file.


topo.sh
=======

#!/bin/bash

python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"


topology.py
===========

import sys
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack1',
             '1.2.3.5' : '/datacenter1/rack1',
             '1.2.3.6' : '/datacenter1/rack1',

             '10.2.3.4' : '/datacenter1/rack2',
             '10.2.3.4' : '/datacenter1/rack2'
    }

if len(sys.argv)==1:
    print DEFAULT_RACK
else:
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


Please check the following link for more details.


https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf




Thanks & Regards

 Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
 Fax : +91-80-41118578

________________________________
From: Abdul Navaz [navaz.enc@gmail.com]
Sent: Monday, November 17, 2014 4:48 AM
To: user@hadoop.apache.org
Subject: Configure Rack Numbers


Hello,

I have hadoop cluster with 9 nodes. All belongs to /default racks. But I want the setup something similar to this.

(All are in same subnets)

 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.


I am trying to check the Hadoop rack awareness and how it copies the single block of data in one rack and replicas in some other rack. I want to analyse some network performance from this.

So how can we separate this DNs based on rack numbers. Where can we configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz

RE: Configure Rack Numbers

Posted by Brahma Reddy Battula <br...@huawei.com>.

Hi Navaz,

you have to configure the following two properties in namenode(after that you need to restart the namenode).


<property>
  <name>topology.node.switch.mapping.impl</name>
  <value>org.apache.hadoop.net.ScriptBasedMapping</value>
  <description> The default implementation of the DNSToSwitchMapping. It
    invokes a script specified in topology.script.file.name to resolve
    node names. If the value for topology.script.file.name is not set, the
    default value of DEFAULT_RACK is returned for all node names.
  </description>
</property>

<property>
  <name>topology.script.file.name</name>
  <value>/path/to/topo.sh</value>
  <description> The script name that should be invoked to resolve DNS names to
    NetworkTopology names. Example: the script would take host.foo.bar as an
    argument, and return /rack1 as the output.
  </description>
</property>


Example script file.


topo.sh
=======

#!/bin/bash

python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"


topology.py
===========

import sys
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack1',
             '1.2.3.5' : '/datacenter1/rack1',
             '1.2.3.6' : '/datacenter1/rack1',

             '10.2.3.4' : '/datacenter1/rack2',
             '10.2.3.4' : '/datacenter1/rack2'
    }

if len(sys.argv)==1:
    print DEFAULT_RACK
else:
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


Please check the following link for more details.


https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf




Thanks & Regards

 Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
 Fax : +91-80-41118578

________________________________
From: Abdul Navaz [navaz.enc@gmail.com]
Sent: Monday, November 17, 2014 4:48 AM
To: user@hadoop.apache.org
Subject: Configure Rack Numbers


Hello,

I have hadoop cluster with 9 nodes. All belongs to /default racks. But I want the setup something similar to this.

(All are in same subnets)

 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.


I am trying to check the Hadoop rack awareness and how it copies the single block of data in one rack and replicas in some other rack. I want to analyse some network performance from this.

So how can we separate this DNs based on rack numbers. Where can we configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz

RE: Configure Rack Numbers

Posted by Brahma Reddy Battula <br...@huawei.com>.

Hi Navaz,

you have to configure the following two properties in namenode(after that you need to restart the namenode).


<property>
  <name>topology.node.switch.mapping.impl</name>
  <value>org.apache.hadoop.net.ScriptBasedMapping</value>
  <description> The default implementation of the DNSToSwitchMapping. It
    invokes a script specified in topology.script.file.name to resolve
    node names. If the value for topology.script.file.name is not set, the
    default value of DEFAULT_RACK is returned for all node names.
  </description>
</property>

<property>
  <name>topology.script.file.name</name>
  <value>/path/to/topo.sh</value>
  <description> The script name that should be invoked to resolve DNS names to
    NetworkTopology names. Example: the script would take host.foo.bar as an
    argument, and return /rack1 as the output.
  </description>
</property>


Example script file.


topo.sh
=======

#!/bin/bash

python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"


topology.py
===========

import sys
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack1',
             '1.2.3.5' : '/datacenter1/rack1',
             '1.2.3.6' : '/datacenter1/rack1',

             '10.2.3.4' : '/datacenter1/rack2',
             '10.2.3.4' : '/datacenter1/rack2'
    }

if len(sys.argv)==1:
    print DEFAULT_RACK
else:
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


Please check the following link for more details.


https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf




Thanks & Regards

 Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
 Fax : +91-80-41118578

________________________________
From: Abdul Navaz [navaz.enc@gmail.com]
Sent: Monday, November 17, 2014 4:48 AM
To: user@hadoop.apache.org
Subject: Configure Rack Numbers


Hello,

I have hadoop cluster with 9 nodes. All belongs to /default racks. But I want the setup something similar to this.

(All are in same subnets)

 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.


I am trying to check the Hadoop rack awareness and how it copies the single block of data in one rack and replicas in some other rack. I want to analyse some network performance from this.

So how can we separate this DNs based on rack numbers. Where can we configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz

RE: Configure Rack Numbers

Posted by Brahma Reddy Battula <br...@huawei.com>.

Hi Navaz,

you have to configure the following two properties in namenode(after that you need to restart the namenode).


<property>
  <name>topology.node.switch.mapping.impl</name>
  <value>org.apache.hadoop.net.ScriptBasedMapping</value>
  <description> The default implementation of the DNSToSwitchMapping. It
    invokes a script specified in topology.script.file.name to resolve
    node names. If the value for topology.script.file.name is not set, the
    default value of DEFAULT_RACK is returned for all node names.
  </description>
</property>

<property>
  <name>topology.script.file.name</name>
  <value>/path/to/topo.sh</value>
  <description> The script name that should be invoked to resolve DNS names to
    NetworkTopology names. Example: the script would take host.foo.bar as an
    argument, and return /rack1 as the output.
  </description>
</property>


Example script file.


topo.sh
=======

#!/bin/bash

python <TOPOLOGY_SCRIPT_HOME>/topology.py "$@"


topology.py
===========

import sys
from string import join

DEFAULT_RACK = '/default/rack0';

RACK_MAP = { '208.94.2.10' : '/datacenter1/rack0',
             '1.2.3.4' : '/datacenter1/rack1',
             '1.2.3.5' : '/datacenter1/rack1',
             '1.2.3.6' : '/datacenter1/rack1',

             '10.2.3.4' : '/datacenter1/rack2',
             '10.2.3.4' : '/datacenter1/rack2'
    }

if len(sys.argv)==1:
    print DEFAULT_RACK
else:
    print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]]," ")


Please check the following link for more details.


https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf




Thanks & Regards

 Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
 Fax : +91-80-41118578

________________________________
From: Abdul Navaz [navaz.enc@gmail.com]
Sent: Monday, November 17, 2014 4:48 AM
To: user@hadoop.apache.org
Subject: Configure Rack Numbers


Hello,

I have hadoop cluster with 9 nodes. All belongs to /default racks. But I want the setup something similar to this.

(All are in same subnets)

 Rack 0: DataNode1,Datanode2,DataNode3 and top of rack switch1.
 Rack 1: DataNode4,Datanode5,DataNode6 and top of rack switch2.
 Rack 3: DataNode7,Datanode8,DataNode9 and top of rack switch3.


I am trying to check the Hadoop rack awareness and how it copies the single block of data in one rack and replicas in some other rack. I want to analyse some network performance from this.

So how can we separate this DNs based on rack numbers. Where can we configure this rack numbers and say this DN belongs to this rack number.


Thanks & Regards,

Abdul Navaz