You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Virajith Jalaparti <vi...@gmail.com> on 2011/07/07 14:39:05 UTC

Using hadoop over machines with multiple interfaces

Hi,

I am trying to set up a Hadoop cluster (using hadoop-0.20.2) using a bunch
of machines each of which have 2 interfaces, a control and an internal
interface. I want only the internal interface to be used for running hadoop
(all hadoop control and data traffic is to be sent only using the internal
interface). I modified the dfs.datanode.dns.interface in hdfs-site.xml and
mapred.tasktracker.dns.interface in mapred-site.xml to point to the internal
interfaces on each of the machines in my cluster. However, even after that,
the communication happens on the control interface (a tcpdump shows that the
control interface of the nodes is being used to transfer data during the
shuffle phase!).

How can I make sure that all data exchanged between the slaves in my cluster
is through the internal interface and not using the control interface? Any
help would be appreciated.

Thanks,
Virajith

Re: Using hadoop over machines with multiple interfaces

Posted by Virajith Jalaparti <vi...@gmail.com>.

Just to be more specific: Different slaves in my cluster have different
interfaces configured to be the internal interfaces. For example, node1 used
eth19 for internal connectivity where as node2 uses eth20. I modified the
hdfs-site.xml and mapred-site.xml on each node making sure that
dfs.datanode.dns.interface and mapred.tasktracker.dns.interface correctly
contain the name of the interface which is configured to be the internal
interface. In particular, for the above example, node1 has eth19 as the
value of dfs.datanode.dns.interface and mapred.tasktracker.dns.interface
and node2 has eth20 as the value for dfs.datanode.dns.interface and
mapred.tasktracker.dns.interface.

I am not sure why this configuration does not make sure that all the nodes
use their internal interfaces for hadoop communication.
Please let me know if you have any idea why this is not working. Thanks!

-Virajith

On Thu, Jul 7, 2011 at 1:39 PM, Virajith Jalaparti <vi...@gmail.com>wrote:

> Hi,
>
> I am trying to set up a Hadoop cluster (using hadoop-0.20.2) using a bunch
> of machines each of which have 2 interfaces, a control and an internal
> interface. I want only the internal interface to be used for running hadoop
> (all hadoop control and data traffic is to be sent only using the internal
> interface). I modified the dfs.datanode.dns.interface in hdfs-site.xml and
> mapred.tasktracker.dns.interface in mapred-site.xml to point to the internal
> interfaces on each of the machines in my cluster. However, even after that,
> the communication happens on the control interface (a tcpdump shows that the
> control interface of the nodes is being used to transfer data during the
> shuffle phase!).
>
> How can I make sure that all data exchanged between the slaves in my
> cluster is through the internal interface and not using the control
> interface? Any help would be appreciated.
>
> Thanks,
> Virajith
>