You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Siddharth Tiwari <si...@live.com> on 2013/11/29 03:21:37 UTC

Multiple nic interfaces on datanode

Hi team,
I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ? Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that. 

Sent from my iPad

Re: Multiple nic interfaces on datanode

Posted by Adam Kawa <ka...@gmail.com>.
Yes, I think so (I have not JobTracker with me to check, though).

It might be a bit old link (
http://archive.cloudera.com/cdh/3/hadoop/cluster_setup.html#Hadoop+Rack+Awareness),
but it says

"The NameNode and the JobTracker obtains the rack id of the slaves in the
cluster by invoking an API
resolve<http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/net/DNSToSwitchMapping.html#resolve(java.util.List)>
 in an administrator configured module. The API resolves the slave's DNS
name (also IP address) to a rack id. What module to use can be configured
using the configuration item topology.node.switch.mapping.impl. The default
implementation of the same runs a script/command configured using
topology.script.file.name..."


2013/11/29 Siddharth Tiwari <si...@live.com>

> Hi Team/ Adam
>
> Thanks for the response
> When you say to have rack awareness script on jobtracker, you mean I must
> have the script in jobtracker machines as well ? I am using jt ha, where I
> am using mr2 binaries for namenode ha and mr1 for jt ha. But do I need to
> put the script in  all four machines or jut Nn
>
> Sent from my iPhone
>
> On Nov 29, 2013, at 6:58 AM, "Adam Kawa" <ka...@gmail.com> wrote:
>
>
> I have 2 nics on my datanodes, is it possible to used the one dedicated
>> for replication and other for all other comunication i.e with jt and
>> namenodes ?
>
>
> Please correct me, I am wrong, but I have never seen a support for that in
> Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5
> maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources
> used for balancing can be limited.
>
>
>> Also eventhough I am using rackawareness script and dfsreport shows the
>> racks jobtracker shows all tasktracker in default rack, how to correct that.
>>
>
> We switched to YARN recently, so it is difficult for me to check it now.
> Do you have rack-awareness script deployed on the JobTracker machine?
>
>

Re: Multiple nic interfaces on datanode

Posted by Adam Kawa <ka...@gmail.com>.
Yes, I think so (I have not JobTracker with me to check, though).

It might be a bit old link (
http://archive.cloudera.com/cdh/3/hadoop/cluster_setup.html#Hadoop+Rack+Awareness),
but it says

"The NameNode and the JobTracker obtains the rack id of the slaves in the
cluster by invoking an API
resolve<http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/net/DNSToSwitchMapping.html#resolve(java.util.List)>
 in an administrator configured module. The API resolves the slave's DNS
name (also IP address) to a rack id. What module to use can be configured
using the configuration item topology.node.switch.mapping.impl. The default
implementation of the same runs a script/command configured using
topology.script.file.name..."


2013/11/29 Siddharth Tiwari <si...@live.com>

> Hi Team/ Adam
>
> Thanks for the response
> When you say to have rack awareness script on jobtracker, you mean I must
> have the script in jobtracker machines as well ? I am using jt ha, where I
> am using mr2 binaries for namenode ha and mr1 for jt ha. But do I need to
> put the script in  all four machines or jut Nn
>
> Sent from my iPhone
>
> On Nov 29, 2013, at 6:58 AM, "Adam Kawa" <ka...@gmail.com> wrote:
>
>
> I have 2 nics on my datanodes, is it possible to used the one dedicated
>> for replication and other for all other comunication i.e with jt and
>> namenodes ?
>
>
> Please correct me, I am wrong, but I have never seen a support for that in
> Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5
> maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources
> used for balancing can be limited.
>
>
>> Also eventhough I am using rackawareness script and dfsreport shows the
>> racks jobtracker shows all tasktracker in default rack, how to correct that.
>>
>
> We switched to YARN recently, so it is difficult for me to check it now.
> Do you have rack-awareness script deployed on the JobTracker machine?
>
>

Re: Multiple nic interfaces on datanode

Posted by Adam Kawa <ka...@gmail.com>.
Yes, I think so (I have not JobTracker with me to check, though).

It might be a bit old link (
http://archive.cloudera.com/cdh/3/hadoop/cluster_setup.html#Hadoop+Rack+Awareness),
but it says

"The NameNode and the JobTracker obtains the rack id of the slaves in the
cluster by invoking an API
resolve<http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/net/DNSToSwitchMapping.html#resolve(java.util.List)>
 in an administrator configured module. The API resolves the slave's DNS
name (also IP address) to a rack id. What module to use can be configured
using the configuration item topology.node.switch.mapping.impl. The default
implementation of the same runs a script/command configured using
topology.script.file.name..."


2013/11/29 Siddharth Tiwari <si...@live.com>

> Hi Team/ Adam
>
> Thanks for the response
> When you say to have rack awareness script on jobtracker, you mean I must
> have the script in jobtracker machines as well ? I am using jt ha, where I
> am using mr2 binaries for namenode ha and mr1 for jt ha. But do I need to
> put the script in  all four machines or jut Nn
>
> Sent from my iPhone
>
> On Nov 29, 2013, at 6:58 AM, "Adam Kawa" <ka...@gmail.com> wrote:
>
>
> I have 2 nics on my datanodes, is it possible to used the one dedicated
>> for replication and other for all other comunication i.e with jt and
>> namenodes ?
>
>
> Please correct me, I am wrong, but I have never seen a support for that in
> Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5
> maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources
> used for balancing can be limited.
>
>
>> Also eventhough I am using rackawareness script and dfsreport shows the
>> racks jobtracker shows all tasktracker in default rack, how to correct that.
>>
>
> We switched to YARN recently, so it is difficult for me to check it now.
> Do you have rack-awareness script deployed on the JobTracker machine?
>
>

Re: Multiple nic interfaces on datanode

Posted by Adam Kawa <ka...@gmail.com>.
Yes, I think so (I have not JobTracker with me to check, though).

It might be a bit old link (
http://archive.cloudera.com/cdh/3/hadoop/cluster_setup.html#Hadoop+Rack+Awareness),
but it says

"The NameNode and the JobTracker obtains the rack id of the slaves in the
cluster by invoking an API
resolve<http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/net/DNSToSwitchMapping.html#resolve(java.util.List)>
 in an administrator configured module. The API resolves the slave's DNS
name (also IP address) to a rack id. What module to use can be configured
using the configuration item topology.node.switch.mapping.impl. The default
implementation of the same runs a script/command configured using
topology.script.file.name..."


2013/11/29 Siddharth Tiwari <si...@live.com>

> Hi Team/ Adam
>
> Thanks for the response
> When you say to have rack awareness script on jobtracker, you mean I must
> have the script in jobtracker machines as well ? I am using jt ha, where I
> am using mr2 binaries for namenode ha and mr1 for jt ha. But do I need to
> put the script in  all four machines or jut Nn
>
> Sent from my iPhone
>
> On Nov 29, 2013, at 6:58 AM, "Adam Kawa" <ka...@gmail.com> wrote:
>
>
> I have 2 nics on my datanodes, is it possible to used the one dedicated
>> for replication and other for all other comunication i.e with jt and
>> namenodes ?
>
>
> Please correct me, I am wrong, but I have never seen a support for that in
> Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5
> maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources
> used for balancing can be limited.
>
>
>> Also eventhough I am using rackawareness script and dfsreport shows the
>> racks jobtracker shows all tasktracker in default rack, how to correct that.
>>
>
> We switched to YARN recently, so it is difficult for me to check it now.
> Do you have rack-awareness script deployed on the JobTracker machine?
>
>

Re: Multiple nic interfaces on datanode

Posted by Siddharth Tiwari <si...@live.com>.
Hi Team/ Adam

Thanks for the response
When you say to have rack awareness script on jobtracker, you mean I must have the script in jobtracker machines as well ? I am using jt ha, where I am using mr2 binaries for namenode ha and mr1 for jt ha. But do I need to put the script in  all four machines or jut Nn

Sent from my iPhone

> On Nov 29, 2013, at 6:58 AM, "Adam Kawa" <ka...@gmail.com> wrote:
> 
> 
>> I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ?
> 
> Please correct me, I am wrong, but I have never seen a support for that in Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5 maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources used for balancing can be limited.
>  
>> Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that.
> 
> We switched to YARN recently, so it is difficult for me to check it now.
> Do you have rack-awareness script deployed on the JobTracker machine?
> 

Re: Multiple nic interfaces on datanode

Posted by Siddharth Tiwari <si...@live.com>.
Hi Team/ Adam

Thanks for the response
When you say to have rack awareness script on jobtracker, you mean I must have the script in jobtracker machines as well ? I am using jt ha, where I am using mr2 binaries for namenode ha and mr1 for jt ha. But do I need to put the script in  all four machines or jut Nn

Sent from my iPhone

> On Nov 29, 2013, at 6:58 AM, "Adam Kawa" <ka...@gmail.com> wrote:
> 
> 
>> I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ?
> 
> Please correct me, I am wrong, but I have never seen a support for that in Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5 maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources used for balancing can be limited.
>  
>> Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that.
> 
> We switched to YARN recently, so it is difficult for me to check it now.
> Do you have rack-awareness script deployed on the JobTracker machine?
> 

Re: Multiple nic interfaces on datanode

Posted by Siddharth Tiwari <si...@live.com>.
Hi Team/ Adam

Thanks for the response
When you say to have rack awareness script on jobtracker, you mean I must have the script in jobtracker machines as well ? I am using jt ha, where I am using mr2 binaries for namenode ha and mr1 for jt ha. But do I need to put the script in  all four machines or jut Nn

Sent from my iPhone

> On Nov 29, 2013, at 6:58 AM, "Adam Kawa" <ka...@gmail.com> wrote:
> 
> 
>> I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ?
> 
> Please correct me, I am wrong, but I have never seen a support for that in Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5 maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources used for balancing can be limited.
>  
>> Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that.
> 
> We switched to YARN recently, so it is difficult for me to check it now.
> Do you have rack-awareness script deployed on the JobTracker machine?
> 

Re: Multiple nic interfaces on datanode

Posted by Siddharth Tiwari <si...@live.com>.
Hi Team/ Adam

Thanks for the response
When you say to have rack awareness script on jobtracker, you mean I must have the script in jobtracker machines as well ? I am using jt ha, where I am using mr2 binaries for namenode ha and mr1 for jt ha. But do I need to put the script in  all four machines or jut Nn

Sent from my iPhone

> On Nov 29, 2013, at 6:58 AM, "Adam Kawa" <ka...@gmail.com> wrote:
> 
> 
>> I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ?
> 
> Please correct me, I am wrong, but I have never seen a support for that in Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5 maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources used for balancing can be limited.
>  
>> Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that.
> 
> We switched to YARN recently, so it is difficult for me to check it now.
> Do you have rack-awareness script deployed on the JobTracker machine?
> 

Re: Multiple nic interfaces on datanode

Posted by Adam Kawa <ka...@gmail.com>.
> I have 2 nics on my datanodes, is it possible to used the one dedicated
> for replication and other for all other comunication i.e with jt and
> namenodes ?


Please correct me, I am wrong, but I have never seen a support for that in
Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5
maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources
used for balancing can be limited.


> Also eventhough I am using rackawareness script and dfsreport shows the
> racks jobtracker shows all tasktracker in default rack, how to correct that.
>

We switched to YARN recently, so it is difficult for me to check it now.
Do you have rack-awareness script deployed on the JobTracker machine?

Re: Multiple nic interfaces on datanode

Posted by Charles Woerner <ch...@gmail.com>.
depending on what you are trying to accomplish with this setup you might just be better off bonding the nics and using the one bonded interface for both.

Sent from my iPhone

> On Nov 28, 2013, at 6:21 PM, Siddharth Tiwari <si...@live.com> wrote:
> 
> Hi team,
> I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ? Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that. 
> 
> Sent from my iPad

Re: Multiple nic interfaces on datanode

Posted by Adam Kawa <ka...@gmail.com>.
> I have 2 nics on my datanodes, is it possible to used the one dedicated
> for replication and other for all other comunication i.e with jt and
> namenodes ?


Please correct me, I am wrong, but I have never seen a support for that in
Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5
maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources
used for balancing can be limited.


> Also eventhough I am using rackawareness script and dfsreport shows the
> racks jobtracker shows all tasktracker in default rack, how to correct that.
>

We switched to YARN recently, so it is difficult for me to check it now.
Do you have rack-awareness script deployed on the JobTracker machine?

Re: Multiple nic interfaces on datanode

Posted by Adam Kawa <ka...@gmail.com>.
> I have 2 nics on my datanodes, is it possible to used the one dedicated
> for replication and other for all other comunication i.e with jt and
> namenodes ?


Please correct me, I am wrong, but I have never seen a support for that in
Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5
maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources
used for balancing can be limited.


> Also eventhough I am using rackawareness script and dfsreport shows the
> racks jobtracker shows all tasktracker in default rack, how to correct that.
>

We switched to YARN recently, so it is difficult for me to check it now.
Do you have rack-awareness script deployed on the JobTracker machine?

Re: Multiple nic interfaces on datanode

Posted by Charles Woerner <ch...@gmail.com>.
depending on what you are trying to accomplish with this setup you might just be better off bonding the nics and using the one bonded interface for both.

Sent from my iPhone

> On Nov 28, 2013, at 6:21 PM, Siddharth Tiwari <si...@live.com> wrote:
> 
> Hi team,
> I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ? Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that. 
> 
> Sent from my iPad

Re: Multiple nic interfaces on datanode

Posted by Charles Woerner <ch...@gmail.com>.
depending on what you are trying to accomplish with this setup you might just be better off bonding the nics and using the one bonded interface for both.

Sent from my iPhone

> On Nov 28, 2013, at 6:21 PM, Siddharth Tiwari <si...@live.com> wrote:
> 
> Hi team,
> I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ? Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that. 
> 
> Sent from my iPad

Re: Multiple nic interfaces on datanode

Posted by Charles Woerner <ch...@gmail.com>.
depending on what you are trying to accomplish with this setup you might just be better off bonding the nics and using the one bonded interface for both.

Sent from my iPhone

> On Nov 28, 2013, at 6:21 PM, Siddharth Tiwari <si...@live.com> wrote:
> 
> Hi team,
> I have 2 nics on my datanodes, is it possible to used the one dedicated for replication and other for all other comunication i.e with jt and namenodes ? Also eventhough I am using rackawareness script and dfsreport shows the racks jobtracker shows all tasktracker in default rack, how to correct that. 
> 
> Sent from my iPad

Re: Multiple nic interfaces on datanode

Posted by Adam Kawa <ka...@gmail.com>.
> I have 2 nics on my datanodes, is it possible to used the one dedicated
> for replication and other for all other comunication i.e with jt and
> namenodes ?


Please correct me, I am wrong, but I have never seen a support for that in
Hadoop. DataNodes uses limited number of threads for balancing (afaik, 5
maximally + dfs.datanode.balance.bandwidthPerSec), so this way resources
used for balancing can be limited.


> Also eventhough I am using rackawareness script and dfsreport shows the
> racks jobtracker shows all tasktracker in default rack, how to correct that.
>

We switched to YARN recently, so it is difficult for me to check it now.
Do you have rack-awareness script deployed on the JobTracker machine?