You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Jeff Stuckman <st...@umd.edu> on 2013/12/15 05:38:35 UTC

Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Hello,



I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM which is on a different host than the RM, and I believe that this is happening because the NM host's dfs.client.local.interfaces property is not having any effect.



I have two hosts set up as follows:

Host A (1.2.3.4):

NameNode

DataNode

ResourceManager

Job History Server



Host B (5.6.7.8):

DataNode

NodeManager



On each host, hdfs-site.xml was edited to change dfs.client.local.interfaces from an interface name ("eth0") to the IPv4 address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This is to prevent the HDFS client from randomly binding to the IPv6 side of the interface (it randomly swaps between the IP4 and IP6 addresses, due to the random bind IP selection in the DFS client) which was causing other problems.



However, I am observing that the Yarn container on the NM appears to inherit the property from the copy of hdfs-site.xml on the RM, rather than reading it from the local configuration file. In other words, setting the dfs.client.local.interfaces property in Host A's configuration file causes the Yarn containers on Host B to use same value of the property. This causes the map task to fail, as the container cannot establish a TCP connection to the HDFS. However, on Host B, other commands that access the HDFS (such as "hadoop fs") do work, as they respect the local value of the property.



To illustrate with an example, I start a streaming job from the command line on Host A:



hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper /home/hadoop/toRecords.pl -reducer /bin/cat



The NodeManager on Host B notes that there was an error starting the container:



13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1387067177654_0002_01_000001 and exit code: 1

org.apache.hadoop.util.Shell$ExitCodeException:

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)

        at org.apache.hadoop.util.Shell.run(Shell.java:379)

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

        at java.util.concurrent.FutureTask.run(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

        at java.lang.Thread.run(Unknown Source)



On Host B, I open userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog and find the following messages (note the DEBUG-level messages which I manually enabled for the DFS client):



2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]

<cut>

2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: newInfo = LocatedBlocks{

  fileLength=537

  underConstruction=false

  blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}]

  lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}

  isLastBlockComplete=true}

2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Connecting to datanode 5.6.7.8:50010

2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interface /1.2.3.4:0

2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and continue. java.net.BindException: Cannot assign requested address



Note the failure to bind to 1.2.3.4, as the IP for Node B's local interface is actually 5.6.7.8.



Note that when running other HDFS commands on Host B, Host B's setting for dfs.client.local.interfaces is respected. On host B:



hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/

13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8] with addresses [/5.6.7.8:0]

Found 3 items

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 hdfs://hosta/linesin

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 hdfs://hosta/system

drwx------   - hadoop supergroup          0 2013-12-14 10:31 hdfs://hosta/tmp



If I change dfs.client.local.interfaces on Host A to eth0 (without touching the setting on Host B), the syslog mentioned above instead shows the following:



2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/5.6.7.8:0]



The job then successfully completes sometimes, but both Host A and Host B will then randomly alternate between the IP4 and IP6 side of their eth0 interfaces, which causes other issues. In other words, changing the dfs.client.local.interfaces setting on Host A to a named adapter caused the Yarn container on Host B to bind to an identically named adapter.

Any ideas on how I can reconfigure the cluster so every container will try to bind to its own interface? I successfully worked around this issue by doing a custom build of HDFS which hardcodes my IP address in the DFSClient, but I am looking for a better long-term solution.



Thanks,

Jeff


Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Azuryy Yu <az...@gmail.com>.
Jeff,
DFSClient don't use copied Configuration from RM.

did you add hostname or IP addr in the conf/slaves? if hostname, Can you
check /etc/hosts? does there have confilicts? and y



On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <st...@umd.edu> wrote:

>  Thanks for the response. I have the preferIPv4Stack option in
> hadoop-env.sh; however; this was not preventing the mapreduce container
> from enumerating the IPv6 address of the interface.
>
>
>
> Jeff
>
>
>
> *From:* Chris Mawata [mailto:chris.mawata@gmail.com]
> *Sent:* Sunday, December 15, 2013 3:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Site-specific dfs.client.local.interfaces setting not
> respected for Yarn MR container
>
>
>
> You might have better luck with an alternative approach to avoid having
> IPV6 which is to add to your hadoop-env.sh
>
> HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
>
>
>
> Chris
>
>
>
>
>
> On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
>
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming
> jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM
> which is on a different host than the RM, and I believe that this is
> happening because the NM host's dfs.client.local.interfaces property is not
> having any effect.
>
>
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
>
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
>
>
> On each host, hdfs-site.xml was edited to change
> dfs.client.local.interfaces from an interface name ("eth0") to the IPv4
> address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This
> is to prevent the HDFS client from randomly binding to the IPv6 side of the
> interface (it randomly swaps between the IP4 and IP6 addresses, due to the
> random bind IP selection in the DFS client) which was causing other
> problems.
>
>
>
> However, I am observing that the Yarn container on the NM appears to
> inherit the property from the copy of hdfs-site.xml on the RM, rather than
> reading it from the local configuration file. In other words, setting the
> dfs.client.local.interfaces property in Host A's configuration file causes
> the Yarn containers on Host B to use same value of the property. This
> causes the map task to fail, as the container cannot establish a TCP
> connection to the HDFS. However, on Host B, other commands that access the
> HDFS (such as "hadoop fs") do work, as they respect the local value of the
> property.
>
>
>
> To illustrate with an example, I start a streaming job from the command
> line on Host A:
>
>
>
> hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
> -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
>
>
> The NodeManager on Host B notes that there was an error starting the
> container:
>
>
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> On Host B, I open
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog
> and find the following messages (note the DEBUG-level messages which I
> manually enabled for the DFS client):
>
>
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}]
>
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient:
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and
> continue. java.net.BindException: Cannot assign requested address
>
>
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local
> interface is actually 5.6.7.8.
>
>
>
> Note that when running other HDFS commands on Host B, Host B's setting for
> dfs.client.local.interfaces is respected. On host B:
>
>
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8]
> with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31
> hdfs://hosta/tmp
>
>
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without
> touching the setting on Host B), the syslog mentioned above instead shows
> the following:
>
>
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/
> 5.6.7.8:0]
>
>
>
> The job then successfully completes sometimes, but both Host A and Host B
> will then randomly alternate between the IP4 and IP6 side of their eth0
> interfaces, which causes other issues. In other words, changing the
> dfs.client.local.interfaces setting on Host A to a named adapter caused the
> Yarn container on Host B to bind to an identically named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will try
> to bind to its own interface? I successfully worked around this issue by
> doing a custom build of HDFS which hardcodes my IP address in the
> DFSClient, but I am looking for a better long-term solution.
>
>
>
> Thanks,
>
> Jeff
>
>
>
>
>

Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Azuryy Yu <az...@gmail.com>.
Jeff,
DFSClient don't use copied Configuration from RM.

did you add hostname or IP addr in the conf/slaves? if hostname, Can you
check /etc/hosts? does there have confilicts? and y



On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <st...@umd.edu> wrote:

>  Thanks for the response. I have the preferIPv4Stack option in
> hadoop-env.sh; however; this was not preventing the mapreduce container
> from enumerating the IPv6 address of the interface.
>
>
>
> Jeff
>
>
>
> *From:* Chris Mawata [mailto:chris.mawata@gmail.com]
> *Sent:* Sunday, December 15, 2013 3:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Site-specific dfs.client.local.interfaces setting not
> respected for Yarn MR container
>
>
>
> You might have better luck with an alternative approach to avoid having
> IPV6 which is to add to your hadoop-env.sh
>
> HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
>
>
>
> Chris
>
>
>
>
>
> On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
>
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming
> jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM
> which is on a different host than the RM, and I believe that this is
> happening because the NM host's dfs.client.local.interfaces property is not
> having any effect.
>
>
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
>
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
>
>
> On each host, hdfs-site.xml was edited to change
> dfs.client.local.interfaces from an interface name ("eth0") to the IPv4
> address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This
> is to prevent the HDFS client from randomly binding to the IPv6 side of the
> interface (it randomly swaps between the IP4 and IP6 addresses, due to the
> random bind IP selection in the DFS client) which was causing other
> problems.
>
>
>
> However, I am observing that the Yarn container on the NM appears to
> inherit the property from the copy of hdfs-site.xml on the RM, rather than
> reading it from the local configuration file. In other words, setting the
> dfs.client.local.interfaces property in Host A's configuration file causes
> the Yarn containers on Host B to use same value of the property. This
> causes the map task to fail, as the container cannot establish a TCP
> connection to the HDFS. However, on Host B, other commands that access the
> HDFS (such as "hadoop fs") do work, as they respect the local value of the
> property.
>
>
>
> To illustrate with an example, I start a streaming job from the command
> line on Host A:
>
>
>
> hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
> -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
>
>
> The NodeManager on Host B notes that there was an error starting the
> container:
>
>
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> On Host B, I open
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog
> and find the following messages (note the DEBUG-level messages which I
> manually enabled for the DFS client):
>
>
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}]
>
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient:
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and
> continue. java.net.BindException: Cannot assign requested address
>
>
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local
> interface is actually 5.6.7.8.
>
>
>
> Note that when running other HDFS commands on Host B, Host B's setting for
> dfs.client.local.interfaces is respected. On host B:
>
>
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8]
> with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31
> hdfs://hosta/tmp
>
>
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without
> touching the setting on Host B), the syslog mentioned above instead shows
> the following:
>
>
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/
> 5.6.7.8:0]
>
>
>
> The job then successfully completes sometimes, but both Host A and Host B
> will then randomly alternate between the IP4 and IP6 side of their eth0
> interfaces, which causes other issues. In other words, changing the
> dfs.client.local.interfaces setting on Host A to a named adapter caused the
> Yarn container on Host B to bind to an identically named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will try
> to bind to its own interface? I successfully worked around this issue by
> doing a custom build of HDFS which hardcodes my IP address in the
> DFSClient, but I am looking for a better long-term solution.
>
>
>
> Thanks,
>
> Jeff
>
>
>
>
>

Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Azuryy Yu <az...@gmail.com>.
Jeff,
DFSClient don't use copied Configuration from RM.

did you add hostname or IP addr in the conf/slaves? if hostname, Can you
check /etc/hosts? does there have confilicts? and y



On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <st...@umd.edu> wrote:

>  Thanks for the response. I have the preferIPv4Stack option in
> hadoop-env.sh; however; this was not preventing the mapreduce container
> from enumerating the IPv6 address of the interface.
>
>
>
> Jeff
>
>
>
> *From:* Chris Mawata [mailto:chris.mawata@gmail.com]
> *Sent:* Sunday, December 15, 2013 3:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Site-specific dfs.client.local.interfaces setting not
> respected for Yarn MR container
>
>
>
> You might have better luck with an alternative approach to avoid having
> IPV6 which is to add to your hadoop-env.sh
>
> HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
>
>
>
> Chris
>
>
>
>
>
> On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
>
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming
> jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM
> which is on a different host than the RM, and I believe that this is
> happening because the NM host's dfs.client.local.interfaces property is not
> having any effect.
>
>
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
>
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
>
>
> On each host, hdfs-site.xml was edited to change
> dfs.client.local.interfaces from an interface name ("eth0") to the IPv4
> address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This
> is to prevent the HDFS client from randomly binding to the IPv6 side of the
> interface (it randomly swaps between the IP4 and IP6 addresses, due to the
> random bind IP selection in the DFS client) which was causing other
> problems.
>
>
>
> However, I am observing that the Yarn container on the NM appears to
> inherit the property from the copy of hdfs-site.xml on the RM, rather than
> reading it from the local configuration file. In other words, setting the
> dfs.client.local.interfaces property in Host A's configuration file causes
> the Yarn containers on Host B to use same value of the property. This
> causes the map task to fail, as the container cannot establish a TCP
> connection to the HDFS. However, on Host B, other commands that access the
> HDFS (such as "hadoop fs") do work, as they respect the local value of the
> property.
>
>
>
> To illustrate with an example, I start a streaming job from the command
> line on Host A:
>
>
>
> hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
> -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
>
>
> The NodeManager on Host B notes that there was an error starting the
> container:
>
>
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> On Host B, I open
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog
> and find the following messages (note the DEBUG-level messages which I
> manually enabled for the DFS client):
>
>
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}]
>
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient:
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and
> continue. java.net.BindException: Cannot assign requested address
>
>
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local
> interface is actually 5.6.7.8.
>
>
>
> Note that when running other HDFS commands on Host B, Host B's setting for
> dfs.client.local.interfaces is respected. On host B:
>
>
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8]
> with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31
> hdfs://hosta/tmp
>
>
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without
> touching the setting on Host B), the syslog mentioned above instead shows
> the following:
>
>
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/
> 5.6.7.8:0]
>
>
>
> The job then successfully completes sometimes, but both Host A and Host B
> will then randomly alternate between the IP4 and IP6 side of their eth0
> interfaces, which causes other issues. In other words, changing the
> dfs.client.local.interfaces setting on Host A to a named adapter caused the
> Yarn container on Host B to bind to an identically named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will try
> to bind to its own interface? I successfully worked around this issue by
> doing a custom build of HDFS which hardcodes my IP address in the
> DFSClient, but I am looking for a better long-term solution.
>
>
>
> Thanks,
>
> Jeff
>
>
>
>
>

Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Azuryy Yu <az...@gmail.com>.
Jeff,
DFSClient don't use copied Configuration from RM.

did you add hostname or IP addr in the conf/slaves? if hostname, Can you
check /etc/hosts? does there have confilicts? and y



On Mon, Dec 16, 2013 at 5:01 AM, Jeff Stuckman <st...@umd.edu> wrote:

>  Thanks for the response. I have the preferIPv4Stack option in
> hadoop-env.sh; however; this was not preventing the mapreduce container
> from enumerating the IPv6 address of the interface.
>
>
>
> Jeff
>
>
>
> *From:* Chris Mawata [mailto:chris.mawata@gmail.com]
> *Sent:* Sunday, December 15, 2013 3:58 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Site-specific dfs.client.local.interfaces setting not
> respected for Yarn MR container
>
>
>
> You might have better luck with an alternative approach to avoid having
> IPV6 which is to add to your hadoop-env.sh
>
> HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
>
>
>
> Chris
>
>
>
>
>
> On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
>
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming
> jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM
> which is on a different host than the RM, and I believe that this is
> happening because the NM host's dfs.client.local.interfaces property is not
> having any effect.
>
>
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
>
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
>
>
> On each host, hdfs-site.xml was edited to change
> dfs.client.local.interfaces from an interface name ("eth0") to the IPv4
> address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This
> is to prevent the HDFS client from randomly binding to the IPv6 side of the
> interface (it randomly swaps between the IP4 and IP6 addresses, due to the
> random bind IP selection in the DFS client) which was causing other
> problems.
>
>
>
> However, I am observing that the Yarn container on the NM appears to
> inherit the property from the copy of hdfs-site.xml on the RM, rather than
> reading it from the local configuration file. In other words, setting the
> dfs.client.local.interfaces property in Host A's configuration file causes
> the Yarn containers on Host B to use same value of the property. This
> causes the map task to fail, as the container cannot establish a TCP
> connection to the HDFS. However, on Host B, other commands that access the
> HDFS (such as "hadoop fs") do work, as they respect the local value of the
> property.
>
>
>
> To illustrate with an example, I start a streaming job from the command
> line on Host A:
>
>
>
> hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
> -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
>
>
> The NodeManager on Host B notes that there was an error starting the
> container:
>
>
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> On Host B, I open
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog
> and find the following messages (note the DEBUG-level messages which I
> manually enabled for the DFS client):
>
>
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}]
>
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493;
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010,
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient:
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and
> continue. java.net.BindException: Cannot assign requested address
>
>
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local
> interface is actually 5.6.7.8.
>
>
>
> Note that when running other HDFS commands on Host B, Host B's setting for
> dfs.client.local.interfaces is respected. On host B:
>
>
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8]
> with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31
> hdfs://hosta/tmp
>
>
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without
> touching the setting on Host B), the syslog mentioned above instead shows
> the following:
>
>
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient:
> Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/
> 5.6.7.8:0]
>
>
>
> The job then successfully completes sometimes, but both Host A and Host B
> will then randomly alternate between the IP4 and IP6 side of their eth0
> interfaces, which causes other issues. In other words, changing the
> dfs.client.local.interfaces setting on Host A to a named adapter caused the
> Yarn container on Host B to bind to an identically named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will try
> to bind to its own interface? I successfully worked around this issue by
> doing a custom build of HDFS which hardcodes my IP address in the
> DFSClient, but I am looking for a better long-term solution.
>
>
>
> Thanks,
>
> Jeff
>
>
>
>
>

RE: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Jeff Stuckman <st...@umd.edu>.
Thanks for the response. I have the preferIPv4Stack option in hadoop-env.sh; however; this was not preventing the mapreduce container from enumerating the IPv6 address of the interface.

Jeff

From: Chris Mawata [mailto:chris.mawata@gmail.com]
Sent: Sunday, December 15, 2013 3:58 PM
To: user@hadoop.apache.org
Subject: Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

You might have better luck with an alternative approach to avoid having IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true



Chris




On 12/14/2013 11:38 PM, Jeff Stuckman wrote:

Hello,



I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM which is on a different host than the RM, and I believe that this is happening because the NM host's dfs.client.local.interfaces property is not having any effect.



I have two hosts set up as follows:

Host A (1.2.3.4):

NameNode

DataNode

ResourceManager

Job History Server



Host B (5.6.7.8):

DataNode

NodeManager



On each host, hdfs-site.xml was edited to change dfs.client.local.interfaces from an interface name ("eth0") to the IPv4 address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This is to prevent the HDFS client from randomly binding to the IPv6 side of the interface (it randomly swaps between the IP4 and IP6 addresses, due to the random bind IP selection in the DFS client) which was causing other problems.



However, I am observing that the Yarn container on the NM appears to inherit the property from the copy of hdfs-site.xml on the RM, rather than reading it from the local configuration file. In other words, setting the dfs.client.local.interfaces property in Host A's configuration file causes the Yarn containers on Host B to use same value of the property. This causes the map task to fail, as the container cannot establish a TCP connection to the HDFS. However, on Host B, other commands that access the HDFS (such as "hadoop fs") do work, as they respect the local value of the property.



To illustrate with an example, I start a streaming job from the command line on Host A:



hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper /home/hadoop/toRecords.pl -reducer /bin/cat



The NodeManager on Host B notes that there was an error starting the container:



13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1387067177654_0002_01_000001 and exit code: 1

org.apache.hadoop.util.Shell$ExitCodeException:

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)

        at org.apache.hadoop.util.Shell.run(Shell.java:379)

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

        at java.util.concurrent.FutureTask.run(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

        at java.lang.Thread.run(Unknown Source)



On Host B, I open userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog and find the following messages (note the DEBUG-level messages which I manually enabled for the DFS client):



2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]

<cut>

2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: newInfo = LocatedBlocks{

  fileLength=537

  underConstruction=false

  blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}]

  lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}

  isLastBlockComplete=true}

2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Connecting to datanode 5.6.7.8:50010

2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interface /1.2.3.4:0

2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and continue. java.net.BindException: Cannot assign requested address



Note the failure to bind to 1.2.3.4, as the IP for Node B's local interface is actually 5.6.7.8.



Note that when running other HDFS commands on Host B, Host B's setting for dfs.client.local.interfaces is respected. On host B:



hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/

13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8] with addresses [/5.6.7.8:0]

Found 3 items

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 hdfs://hosta/linesin

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 hdfs://hosta/system

drwx------   - hadoop supergroup          0 2013-12-14 10:31 hdfs://hosta/tmp



If I change dfs.client.local.interfaces on Host A to eth0 (without touching the setting on Host B), the syslog mentioned above instead shows the following:



2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/5.6.7.8:0]



The job then successfully completes sometimes, but both Host A and Host B will then randomly alternate between the IP4 and IP6 side of their eth0 interfaces, which causes other issues. In other words, changing the dfs.client.local.interfaces setting on Host A to a named adapter caused the Yarn container on Host B to bind to an identically named adapter.

Any ideas on how I can reconfigure the cluster so every container will try to bind to its own interface? I successfully worked around this issue by doing a custom build of HDFS which hardcodes my IP address in the DFSClient, but I am looking for a better long-term solution.



Thanks,

Jeff



RE: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Jeff Stuckman <st...@umd.edu>.
Thanks for the response. I have the preferIPv4Stack option in hadoop-env.sh; however; this was not preventing the mapreduce container from enumerating the IPv6 address of the interface.

Jeff

From: Chris Mawata [mailto:chris.mawata@gmail.com]
Sent: Sunday, December 15, 2013 3:58 PM
To: user@hadoop.apache.org
Subject: Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

You might have better luck with an alternative approach to avoid having IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true



Chris




On 12/14/2013 11:38 PM, Jeff Stuckman wrote:

Hello,



I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM which is on a different host than the RM, and I believe that this is happening because the NM host's dfs.client.local.interfaces property is not having any effect.



I have two hosts set up as follows:

Host A (1.2.3.4):

NameNode

DataNode

ResourceManager

Job History Server



Host B (5.6.7.8):

DataNode

NodeManager



On each host, hdfs-site.xml was edited to change dfs.client.local.interfaces from an interface name ("eth0") to the IPv4 address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This is to prevent the HDFS client from randomly binding to the IPv6 side of the interface (it randomly swaps between the IP4 and IP6 addresses, due to the random bind IP selection in the DFS client) which was causing other problems.



However, I am observing that the Yarn container on the NM appears to inherit the property from the copy of hdfs-site.xml on the RM, rather than reading it from the local configuration file. In other words, setting the dfs.client.local.interfaces property in Host A's configuration file causes the Yarn containers on Host B to use same value of the property. This causes the map task to fail, as the container cannot establish a TCP connection to the HDFS. However, on Host B, other commands that access the HDFS (such as "hadoop fs") do work, as they respect the local value of the property.



To illustrate with an example, I start a streaming job from the command line on Host A:



hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper /home/hadoop/toRecords.pl -reducer /bin/cat



The NodeManager on Host B notes that there was an error starting the container:



13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1387067177654_0002_01_000001 and exit code: 1

org.apache.hadoop.util.Shell$ExitCodeException:

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)

        at org.apache.hadoop.util.Shell.run(Shell.java:379)

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

        at java.util.concurrent.FutureTask.run(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

        at java.lang.Thread.run(Unknown Source)



On Host B, I open userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog and find the following messages (note the DEBUG-level messages which I manually enabled for the DFS client):



2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]

<cut>

2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: newInfo = LocatedBlocks{

  fileLength=537

  underConstruction=false

  blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}]

  lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}

  isLastBlockComplete=true}

2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Connecting to datanode 5.6.7.8:50010

2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interface /1.2.3.4:0

2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and continue. java.net.BindException: Cannot assign requested address



Note the failure to bind to 1.2.3.4, as the IP for Node B's local interface is actually 5.6.7.8.



Note that when running other HDFS commands on Host B, Host B's setting for dfs.client.local.interfaces is respected. On host B:



hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/

13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8] with addresses [/5.6.7.8:0]

Found 3 items

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 hdfs://hosta/linesin

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 hdfs://hosta/system

drwx------   - hadoop supergroup          0 2013-12-14 10:31 hdfs://hosta/tmp



If I change dfs.client.local.interfaces on Host A to eth0 (without touching the setting on Host B), the syslog mentioned above instead shows the following:



2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/5.6.7.8:0]



The job then successfully completes sometimes, but both Host A and Host B will then randomly alternate between the IP4 and IP6 side of their eth0 interfaces, which causes other issues. In other words, changing the dfs.client.local.interfaces setting on Host A to a named adapter caused the Yarn container on Host B to bind to an identically named adapter.

Any ideas on how I can reconfigure the cluster so every container will try to bind to its own interface? I successfully worked around this issue by doing a custom build of HDFS which hardcodes my IP address in the DFSClient, but I am looking for a better long-term solution.



Thanks,

Jeff



RE: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Jeff Stuckman <st...@umd.edu>.
Thanks for the response. I have the preferIPv4Stack option in hadoop-env.sh; however; this was not preventing the mapreduce container from enumerating the IPv6 address of the interface.

Jeff

From: Chris Mawata [mailto:chris.mawata@gmail.com]
Sent: Sunday, December 15, 2013 3:58 PM
To: user@hadoop.apache.org
Subject: Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

You might have better luck with an alternative approach to avoid having IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true



Chris




On 12/14/2013 11:38 PM, Jeff Stuckman wrote:

Hello,



I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM which is on a different host than the RM, and I believe that this is happening because the NM host's dfs.client.local.interfaces property is not having any effect.



I have two hosts set up as follows:

Host A (1.2.3.4):

NameNode

DataNode

ResourceManager

Job History Server



Host B (5.6.7.8):

DataNode

NodeManager



On each host, hdfs-site.xml was edited to change dfs.client.local.interfaces from an interface name ("eth0") to the IPv4 address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This is to prevent the HDFS client from randomly binding to the IPv6 side of the interface (it randomly swaps between the IP4 and IP6 addresses, due to the random bind IP selection in the DFS client) which was causing other problems.



However, I am observing that the Yarn container on the NM appears to inherit the property from the copy of hdfs-site.xml on the RM, rather than reading it from the local configuration file. In other words, setting the dfs.client.local.interfaces property in Host A's configuration file causes the Yarn containers on Host B to use same value of the property. This causes the map task to fail, as the container cannot establish a TCP connection to the HDFS. However, on Host B, other commands that access the HDFS (such as "hadoop fs") do work, as they respect the local value of the property.



To illustrate with an example, I start a streaming job from the command line on Host A:



hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper /home/hadoop/toRecords.pl -reducer /bin/cat



The NodeManager on Host B notes that there was an error starting the container:



13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1387067177654_0002_01_000001 and exit code: 1

org.apache.hadoop.util.Shell$ExitCodeException:

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)

        at org.apache.hadoop.util.Shell.run(Shell.java:379)

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

        at java.util.concurrent.FutureTask.run(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

        at java.lang.Thread.run(Unknown Source)



On Host B, I open userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog and find the following messages (note the DEBUG-level messages which I manually enabled for the DFS client):



2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]

<cut>

2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: newInfo = LocatedBlocks{

  fileLength=537

  underConstruction=false

  blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}]

  lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}

  isLastBlockComplete=true}

2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Connecting to datanode 5.6.7.8:50010

2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interface /1.2.3.4:0

2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and continue. java.net.BindException: Cannot assign requested address



Note the failure to bind to 1.2.3.4, as the IP for Node B's local interface is actually 5.6.7.8.



Note that when running other HDFS commands on Host B, Host B's setting for dfs.client.local.interfaces is respected. On host B:



hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/

13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8] with addresses [/5.6.7.8:0]

Found 3 items

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 hdfs://hosta/linesin

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 hdfs://hosta/system

drwx------   - hadoop supergroup          0 2013-12-14 10:31 hdfs://hosta/tmp



If I change dfs.client.local.interfaces on Host A to eth0 (without touching the setting on Host B), the syslog mentioned above instead shows the following:



2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/5.6.7.8:0]



The job then successfully completes sometimes, but both Host A and Host B will then randomly alternate between the IP4 and IP6 side of their eth0 interfaces, which causes other issues. In other words, changing the dfs.client.local.interfaces setting on Host A to a named adapter caused the Yarn container on Host B to bind to an identically named adapter.

Any ideas on how I can reconfigure the cluster so every container will try to bind to its own interface? I successfully worked around this issue by doing a custom build of HDFS which hardcodes my IP address in the DFSClient, but I am looking for a better long-term solution.



Thanks,

Jeff



RE: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Jeff Stuckman <st...@umd.edu>.
Thanks for the response. I have the preferIPv4Stack option in hadoop-env.sh; however; this was not preventing the mapreduce container from enumerating the IPv6 address of the interface.

Jeff

From: Chris Mawata [mailto:chris.mawata@gmail.com]
Sent: Sunday, December 15, 2013 3:58 PM
To: user@hadoop.apache.org
Subject: Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

You might have better luck with an alternative approach to avoid having IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true



Chris




On 12/14/2013 11:38 PM, Jeff Stuckman wrote:

Hello,



I have set up a two-node Hadoop cluster on Ubuntu 12.04 running streaming jobs with Hadoop 2.2.0. I am having problems with running tasks on a NM which is on a different host than the RM, and I believe that this is happening because the NM host's dfs.client.local.interfaces property is not having any effect.



I have two hosts set up as follows:

Host A (1.2.3.4):

NameNode

DataNode

ResourceManager

Job History Server



Host B (5.6.7.8):

DataNode

NodeManager



On each host, hdfs-site.xml was edited to change dfs.client.local.interfaces from an interface name ("eth0") to the IPv4 address representing that host's interface ("1.2.3.4" or "5.6.7.8"). This is to prevent the HDFS client from randomly binding to the IPv6 side of the interface (it randomly swaps between the IP4 and IP6 addresses, due to the random bind IP selection in the DFS client) which was causing other problems.



However, I am observing that the Yarn container on the NM appears to inherit the property from the copy of hdfs-site.xml on the RM, rather than reading it from the local configuration file. In other words, setting the dfs.client.local.interfaces property in Host A's configuration file causes the Yarn containers on Host B to use same value of the property. This causes the map task to fail, as the container cannot establish a TCP connection to the HDFS. However, on Host B, other commands that access the HDFS (such as "hadoop fs") do work, as they respect the local value of the property.



To illustrate with an example, I start a streaming job from the command line on Host A:



hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper /home/hadoop/toRecords.pl -reducer /bin/cat



The NodeManager on Host B notes that there was an error starting the container:



13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1387067177654_0002_01_000001 and exit code: 1

org.apache.hadoop.util.Shell$ExitCodeException:

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)

        at org.apache.hadoop.util.Shell.run(Shell.java:379)

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

        at java.util.concurrent.FutureTask.run(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

        at java.lang.Thread.run(Unknown Source)



On Host B, I open userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog and find the following messages (note the DEBUG-level messages which I manually enabled for the DFS client):



2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]

<cut>

2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: newInfo = LocatedBlocks{

  fileLength=537

  underConstruction=false

  blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}]

  lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 1.2.3.4:50010]}

  isLastBlockComplete=true}

2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Connecting to datanode 5.6.7.8:50010

2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interface /1.2.3.4:0

2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and continue. java.net.BindException: Cannot assign requested address



Note the failure to bind to 1.2.3.4, as the IP for Node B's local interface is actually 5.6.7.8.



Note that when running other HDFS commands on Host B, Host B's setting for dfs.client.local.interfaces is respected. On host B:



hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/

13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces [5.6.7.8] with addresses [/5.6.7.8:0]

Found 3 items

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 hdfs://hosta/linesin

drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 hdfs://hosta/system

drwx------   - hadoop supergroup          0 2013-12-14 10:31 hdfs://hosta/tmp



If I change dfs.client.local.interfaces on Host A to eth0 (without touching the setting on Host B), the syslog mentioned above instead shows the following:



2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: Using local interfaces [eth0] with addresses [/<some IP6 address>:0,/5.6.7.8:0]



The job then successfully completes sometimes, but both Host A and Host B will then randomly alternate between the IP4 and IP6 side of their eth0 interfaces, which causes other issues. In other words, changing the dfs.client.local.interfaces setting on Host A to a named adapter caused the Yarn container on Host B to bind to an identically named adapter.

Any ideas on how I can reconfigure the cluster so every container will try to bind to its own interface? I successfully worked around this issue by doing a custom build of HDFS which hardcodes my IP address in the DFSClient, but I am looking for a better long-term solution.



Thanks,

Jeff



Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Chris Mawata <ch...@gmail.com>.
You might have better luck with an alternative approach to avoid having 
IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true

Chris



On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running 
> streaming jobs with Hadoop 2.2.0. I am having problems with running 
> tasks on a NM which is on a different host than the RM, and I believe 
> that this is happening because the NM host's 
> dfs.client.local.interfaces property is not having any effect.
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
> On each host, hdfs-site.xml was edited to change 
> dfs.client.local.interfaces from an interface name ("eth0") to the 
> IPv4 address representing that host's interface ("1.2.3.4" or 
> "5.6.7.8"). This is to prevent the HDFS client from randomly binding 
> to the IPv6 side of the interface (it randomly swaps between the IP4 
> and IP6 addresses, due to the random bind IP selection in the DFS 
> client) which was causing other problems.
>
> However, I am observing that the Yarn container on the NM appears to 
> inherit the property from the copy of hdfs-site.xml on the RM, rather 
> than reading it from the local configuration file. In other words, 
> setting the dfs.client.local.interfaces property in Host A's 
> configuration file causes the Yarn containers on Host B to use same 
> value of the property. This causes the map task to fail, as the 
> container cannot establish a TCP connection to the HDFS. However, on 
> Host B, other commands that access the HDFS (such as "hadoop fs") do 
> work, as they respect the local value of the property.
>
> To illustrate with an example, I start a streaming job from the 
> command line on Host A:
>
> hadoop jar 
> $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input 
> hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper 
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
> The NodeManager on Host B notes that there was an error starting the 
> container:
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception 
> from container-launch with container ID: 
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> On Host B, I open 
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog 
> and find the following messages (note the DEBUG-level messages which I 
> manually enabled for the DFS client):
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}]
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: 
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and 
> continue. java.net.BindException: Cannot assign requested address
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local 
> interface is actually 5.6.7.8.
>
> Note that when running other HDFS commands on Host B, Host B's setting 
> for dfs.client.local.interfaces is respected. On host B:
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces 
> [5.6.7.8] with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31 
> hdfs://hosta/tmp
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without 
> touching the setting on Host B), the syslog mentioned above instead 
> shows the following:
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [eth0] with addresses [/<some IP6 
> address>:0,/5.6.7.8:0]
>
> The job then successfully completes sometimes, but both Host A and 
> Host B will then randomly alternate between the IP4 and IP6 side of 
> their eth0 interfaces, which causes other issues. In other words, 
> changing the dfs.client.local.interfaces setting on Host A to a named 
> adapter caused the Yarn container on Host B to bind to an identically 
> named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will 
> try to bind to its own interface? I successfully worked around this 
> issue by doing a custom build of HDFS which hardcodes my IP address in 
> the DFSClient, but I am looking for a better long-term solution.
>
> Thanks,
>
> Jeff
>


Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Chris Mawata <ch...@gmail.com>.
You might have better luck with an alternative approach to avoid having 
IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true

Chris



On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running 
> streaming jobs with Hadoop 2.2.0. I am having problems with running 
> tasks on a NM which is on a different host than the RM, and I believe 
> that this is happening because the NM host's 
> dfs.client.local.interfaces property is not having any effect.
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
> On each host, hdfs-site.xml was edited to change 
> dfs.client.local.interfaces from an interface name ("eth0") to the 
> IPv4 address representing that host's interface ("1.2.3.4" or 
> "5.6.7.8"). This is to prevent the HDFS client from randomly binding 
> to the IPv6 side of the interface (it randomly swaps between the IP4 
> and IP6 addresses, due to the random bind IP selection in the DFS 
> client) which was causing other problems.
>
> However, I am observing that the Yarn container on the NM appears to 
> inherit the property from the copy of hdfs-site.xml on the RM, rather 
> than reading it from the local configuration file. In other words, 
> setting the dfs.client.local.interfaces property in Host A's 
> configuration file causes the Yarn containers on Host B to use same 
> value of the property. This causes the map task to fail, as the 
> container cannot establish a TCP connection to the HDFS. However, on 
> Host B, other commands that access the HDFS (such as "hadoop fs") do 
> work, as they respect the local value of the property.
>
> To illustrate with an example, I start a streaming job from the 
> command line on Host A:
>
> hadoop jar 
> $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input 
> hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper 
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
> The NodeManager on Host B notes that there was an error starting the 
> container:
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception 
> from container-launch with container ID: 
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> On Host B, I open 
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog 
> and find the following messages (note the DEBUG-level messages which I 
> manually enabled for the DFS client):
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}]
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: 
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and 
> continue. java.net.BindException: Cannot assign requested address
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local 
> interface is actually 5.6.7.8.
>
> Note that when running other HDFS commands on Host B, Host B's setting 
> for dfs.client.local.interfaces is respected. On host B:
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces 
> [5.6.7.8] with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31 
> hdfs://hosta/tmp
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without 
> touching the setting on Host B), the syslog mentioned above instead 
> shows the following:
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [eth0] with addresses [/<some IP6 
> address>:0,/5.6.7.8:0]
>
> The job then successfully completes sometimes, but both Host A and 
> Host B will then randomly alternate between the IP4 and IP6 side of 
> their eth0 interfaces, which causes other issues. In other words, 
> changing the dfs.client.local.interfaces setting on Host A to a named 
> adapter caused the Yarn container on Host B to bind to an identically 
> named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will 
> try to bind to its own interface? I successfully worked around this 
> issue by doing a custom build of HDFS which hardcodes my IP address in 
> the DFSClient, but I am looking for a better long-term solution.
>
> Thanks,
>
> Jeff
>


Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Chris Mawata <ch...@gmail.com>.
You might have better luck with an alternative approach to avoid having 
IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true

Chris



On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running 
> streaming jobs with Hadoop 2.2.0. I am having problems with running 
> tasks on a NM which is on a different host than the RM, and I believe 
> that this is happening because the NM host's 
> dfs.client.local.interfaces property is not having any effect.
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
> On each host, hdfs-site.xml was edited to change 
> dfs.client.local.interfaces from an interface name ("eth0") to the 
> IPv4 address representing that host's interface ("1.2.3.4" or 
> "5.6.7.8"). This is to prevent the HDFS client from randomly binding 
> to the IPv6 side of the interface (it randomly swaps between the IP4 
> and IP6 addresses, due to the random bind IP selection in the DFS 
> client) which was causing other problems.
>
> However, I am observing that the Yarn container on the NM appears to 
> inherit the property from the copy of hdfs-site.xml on the RM, rather 
> than reading it from the local configuration file. In other words, 
> setting the dfs.client.local.interfaces property in Host A's 
> configuration file causes the Yarn containers on Host B to use same 
> value of the property. This causes the map task to fail, as the 
> container cannot establish a TCP connection to the HDFS. However, on 
> Host B, other commands that access the HDFS (such as "hadoop fs") do 
> work, as they respect the local value of the property.
>
> To illustrate with an example, I start a streaming job from the 
> command line on Host A:
>
> hadoop jar 
> $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input 
> hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper 
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
> The NodeManager on Host B notes that there was an error starting the 
> container:
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception 
> from container-launch with container ID: 
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> On Host B, I open 
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog 
> and find the following messages (note the DEBUG-level messages which I 
> manually enabled for the DFS client):
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}]
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: 
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and 
> continue. java.net.BindException: Cannot assign requested address
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local 
> interface is actually 5.6.7.8.
>
> Note that when running other HDFS commands on Host B, Host B's setting 
> for dfs.client.local.interfaces is respected. On host B:
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces 
> [5.6.7.8] with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31 
> hdfs://hosta/tmp
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without 
> touching the setting on Host B), the syslog mentioned above instead 
> shows the following:
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [eth0] with addresses [/<some IP6 
> address>:0,/5.6.7.8:0]
>
> The job then successfully completes sometimes, but both Host A and 
> Host B will then randomly alternate between the IP4 and IP6 side of 
> their eth0 interfaces, which causes other issues. In other words, 
> changing the dfs.client.local.interfaces setting on Host A to a named 
> adapter caused the Yarn container on Host B to bind to an identically 
> named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will 
> try to bind to its own interface? I successfully worked around this 
> issue by doing a custom build of HDFS which hardcodes my IP address in 
> the DFSClient, but I am looking for a better long-term solution.
>
> Thanks,
>
> Jeff
>


Re: Site-specific dfs.client.local.interfaces setting not respected for Yarn MR container

Posted by Chris Mawata <ch...@gmail.com>.
You might have better luck with an alternative approach to avoid having 
IPV6 which is to add to your hadoop-env.sh

HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true

Chris



On 12/14/2013 11:38 PM, Jeff Stuckman wrote:
>
> Hello,
>
> I have set up a two-node Hadoop cluster on Ubuntu 12.04 running 
> streaming jobs with Hadoop 2.2.0. I am having problems with running 
> tasks on a NM which is on a different host than the RM, and I believe 
> that this is happening because the NM host's 
> dfs.client.local.interfaces property is not having any effect.
>
> I have two hosts set up as follows:
>
> Host A (1.2.3.4):
>
> NameNode
>
> DataNode
>
> ResourceManager
>
> Job History Server
>
> Host B (5.6.7.8):
>
> DataNode
>
> NodeManager
>
> On each host, hdfs-site.xml was edited to change 
> dfs.client.local.interfaces from an interface name ("eth0") to the 
> IPv4 address representing that host's interface ("1.2.3.4" or 
> "5.6.7.8"). This is to prevent the HDFS client from randomly binding 
> to the IPv6 side of the interface (it randomly swaps between the IP4 
> and IP6 addresses, due to the random bind IP selection in the DFS 
> client) which was causing other problems.
>
> However, I am observing that the Yarn container on the NM appears to 
> inherit the property from the copy of hdfs-site.xml on the RM, rather 
> than reading it from the local configuration file. In other words, 
> setting the dfs.client.local.interfaces property in Host A's 
> configuration file causes the Yarn containers on Host B to use same 
> value of the property. This causes the map task to fail, as the 
> container cannot establish a TCP connection to the HDFS. However, on 
> Host B, other commands that access the HDFS (such as "hadoop fs") do 
> work, as they respect the local value of the property.
>
> To illustrate with an example, I start a streaming job from the 
> command line on Host A:
>
> hadoop jar 
> $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar -input 
> hdfs://hosta/linesin/ -output hdfs://hosta/linesout -mapper 
> /home/hadoop/toRecords.pl -reducer /bin/cat
>
> The NodeManager on Host B notes that there was an error starting the 
> container:
>
> 13/12/14 19:38:45 WARN nodemanager.DefaultContainerExecutor: Exception 
> from container-launch with container ID: 
> container_1387067177654_0002_01_000001 and exit code: 1
>
> org.apache.hadoop.util.Shell$ExitCodeException:
>
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> On Host B, I open 
> userlogs/application_1387067177654_0002/container_1387067177654_0002_01_000001/syslog 
> and find the following messages (note the DEBUG-level messages which I 
> manually enabled for the DFS client):
>
> 2013-12-14 19:38:32,439 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [1.2.3.4] with addresses [/1.2.3.4:0]
>
> <cut>
>
> 2013-12-14 19:38:33,085 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> newInfo = LocatedBlocks{
>
>   fileLength=537
>
>   underConstruction=false
>
> blocks=[LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}]
>
> lastLocatedBlock=LocatedBlock{BP-1911846690-1.2.3.4-1386999495143:blk_1073742317_1493; 
> getBlockSize()=537; corrupt=false; offset=0; locs=[5.6.7.8:50010, 
> 1.2.3.4:50010]}
>
>   isLastBlockComplete=true}
>
> 2013-12-14 19:38:33,088 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Connecting to datanode 5.6.7.8:50010
>
> 2013-12-14 19:38:33,090 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interface /1.2.3.4:0
>
> 2013-12-14 19:38:33,095 WARN [main] org.apache.hadoop.hdfs.DFSClient: 
> Failed to connect to /5.6.7.8:50010 for block, add to deadNodes and 
> continue. java.net.BindException: Cannot assign requested address
>
> Note the failure to bind to 1.2.3.4, as the IP for Node B's local 
> interface is actually 5.6.7.8.
>
> Note that when running other HDFS commands on Host B, Host B's setting 
> for dfs.client.local.interfaces is respected. On host B:
>
> hadoop@nodeb:~$ hadoop fs -ls hdfs://hosta/
>
> 13/12/14 19:45:10 DEBUG hdfs.DFSClient: Using local interfaces 
> [5.6.7.8] with addresses [/5.6.7.8:0]
>
> Found 3 items
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 00:40 
> hdfs://hosta/linesin
>
> drwxr-xr-x   - hadoop supergroup          0 2013-12-14 02:01 
> hdfs://hosta/system
>
> drwx------   - hadoop supergroup          0 2013-12-14 10:31 
> hdfs://hosta/tmp
>
> If I change dfs.client.local.interfaces on Host A to eth0 (without 
> touching the setting on Host B), the syslog mentioned above instead 
> shows the following:
>
> 2013-12-14 22:32:19,686 DEBUG [main] org.apache.hadoop.hdfs.DFSClient: 
> Using local interfaces [eth0] with addresses [/<some IP6 
> address>:0,/5.6.7.8:0]
>
> The job then successfully completes sometimes, but both Host A and 
> Host B will then randomly alternate between the IP4 and IP6 side of 
> their eth0 interfaces, which causes other issues. In other words, 
> changing the dfs.client.local.interfaces setting on Host A to a named 
> adapter caused the Yarn container on Host B to bind to an identically 
> named adapter.
>
> Any ideas on how I can reconfigure the cluster so every container will 
> try to bind to its own interface? I successfully worked around this 
> issue by doing a custom build of HDFS which hardcodes my IP address in 
> the DFSClient, but I am looking for a better long-term solution.
>
> Thanks,
>
> Jeff
>