You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Erik Test <er...@gmail.com> on 2010/05/25 22:06:27 UTC

TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Hello All,

I've been unable to resolve this problem on my own so I've decided to ask
for help. I've pasted the logs I have for the DataNode on of the slave
nodes. The logs for TaskTracker are essentially the same (i.e. same
exception causing a shutdown).

Any suggestions or hints as to what could be causing this problem?

STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2010-05-25 13:59:56,504 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 0 time(s).
2010-05-25 13:59:57,506 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 1 time(s).
2010-05-25 13:59:58,508 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 2 time(s).
2010-05-25 13:59:59,510 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 3 time(s).
2010-05-25 14:00:00,512 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 4 time(s).
2010-05-25 14:00:01,514 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 5 time(s).
2010-05-25 14:00:02,516 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 6 time(s).
2010-05-25 14:00:03,518 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 7 time(s).
2010-05-25 14:00:04,520 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 8 time(s).
2010-05-25 14:00:05,522 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master1/192.168.151.1:54310. Already tried 9 time(s).
2010-05-25 14:00:05,525 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call
to master1/192.168.151.1:54310 failed on local exception:
java.net.NoRouteToHostException: No route to host
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
        at org.apache.hadoop.ipc.Client.call(Client.java:743)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy4.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383)
        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314)
        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:269)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
Caused by: java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
        at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
        at org.apache.hadoop.ipc.Client.call(Client.java:720)
        ... 13 more

2010-05-25 14:00:05,526 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at localhost.localdomain/127.0.0.1


Erik

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Erik Test <er...@gmail.com>.
I'm not sure exactly what to look for so I pasted the entire diagnostics for
each machine in this email.


****ant -diagnostics on slave****

------- Ant diagnostics report -------
Apache Ant version 1.8.1 compiled on April 30 2010

-------------------------------------------
 Implementation Version
-------------------------------------------
core tasks     : 1.8.1 in file:/root/apache-ant-1.8.1/lib/ant.jar
optional tasks : 1.8.1 in file:/root/apache-ant-1.8.1/lib/ant-nodeps.jar

-------------------------------------------
 ANT PROPERTIES
-------------------------------------------
ant.version: Apache Ant version 1.8.1 compiled on April 30 2010
ant.java.version: 1.6
ant.core.lib: /root/apache-ant-1.8.1/lib/ant.jar
ant.home: /root/apache-ant-1.8.1

-------------------------------------------
 ANT_HOME/lib jar listing
-------------------------------------------
ant.home: /root/apache-ant-1.8.1
ant-jmf.jar (6740 bytes)
ant-apache-bsf.jar (3932 bytes)
ant-apache-regexp.jar (3762 bytes)
ant-javamail.jar (7897 bytes)
ant-apache-oro.jar (39639 bytes)
ant-junit.jar (97539 bytes)
ant-launcher.jar (12312 bytes)
ant-commons-net.jar (85346 bytes)
ant-jdepend.jar (8289 bytes)
ant.jar (1514270 bytes)
ant-nodeps.jar (409133 bytes)
ant-antlr.jar (5749 bytes)
ant-swing.jar (7556 bytes)
ant-apache-log4j.jar (3060 bytes)
ant-commons-logging.jar (3914 bytes)
ant-apache-xalan2.jar (2296 bytes)
ant-jai.jar (22266 bytes)
ant-apache-bcel.jar (8747 bytes)
ant-apache-resolver.jar (4085 bytes)
ant-jsch.jar (40200 bytes)
ant-testutil.jar (15201 bytes)
ant-netrexx.jar (10393 bytes)

-------------------------------------------
 USER_HOME/.ant/lib jar listing
-------------------------------------------
user.home: /root
No such directory.

-------------------------------------------
 Tasks availability
-------------------------------------------
image : Missing dependency javax.media.jai.PlanarImage
sshexec : Missing dependency com.jcraft.jsch.Logger
wlrun : Not Available (the implementation class is not present)
scp : Missing dependency com.jcraft.jsch.Logger
stlist : Not Available (the implementation class is not present)
sshsession : Missing dependency com.jcraft.jsch.Logger
starteam : Not Available (the implementation class is not present)
stlabel : Not Available (the implementation class is not present)
jdepend : Missing dependency jdepend.xmlui.JDepend
stcheckin : Not Available (the implementation class is not present)
stcheckout : Not Available (the implementation class is not present)
ejbc : Not Available (the implementation class is not present)
wlstop : Not Available (the implementation class is not present)
ddcreator : Not Available (the implementation class is not present)
A task being missing/unavailable should only matter if you are trying to use
it

-------------------------------------------
 org.apache.env.Which diagnostics
-------------------------------------------
Not available.
Download it at http://xml.apache.org/commons/

-------------------------------------------
 XML Parser information
-------------------------------------------
XML Parser : com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl
XML Parser Location: unknown
Namespace-aware parser :
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Namespace-aware parser Location: unknown

-------------------------------------------
 XSLT Processor information
-------------------------------------------
XSLT Processor :
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl
XSLT Processor Location: unknown

-------------------------------------------
 System properties
-------------------------------------------
java.runtime.name : Java(TM) SE Runtime Environment
sun.boot.library.path : /usr/java/jdk1.6.0_20/jre/lib/amd64
java.vm.version : 16.3-b01
ant.library.dir : /root/apache-ant-1.8.1/lib
java.vm.vendor : Sun Microsystems Inc.
java.vendor.url : http://java.sun.com/
path.separator : :
java.vm.name : Java HotSpot(TM) 64-Bit Server VM
file.encoding.pkg : sun.io
user.country : US
sun.java.launcher : SUN_STANDARD
sun.os.patch.level : unknown
java.vm.specification.name : Java Virtual Machine Specification
user.dir : /root/apache-ant-1.8.1/bin
java.runtime.version : 1.6.0_20-b02
java.awt.graphicsenv : sun.awt.X11GraphicsEnvironment
java.endorsed.dirs : /usr/java/jdk1.6.0_20/jre/lib/endorsed
os.arch : amd64
java.io.tmpdir : /tmp
line.separator :

java.vm.specification.vendor : Sun Microsystems Inc.
os.name : Linux
ant.home : /root/apache-ant-1.8.1
sun.jnu.encoding : UTF-8
java.library.path :
/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:/usr/java/jdk1.6.0_20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
java.specification.name : Java Platform API Specification
java.class.version : 50.0
sun.management.compiler : HotSpot 64-Bit Server Compiler
os.version : 2.6.18-128.el5
user.home : /root
user.timezone : US/Central
java.awt.printerjob : sun.print.PSPrinterJob
file.encoding : UTF-8
java.specification.version : 1.6
user.name : root
java.class.path :
/root/apache-ant-1.8.1/lib/ant-launcher.jar:/root/apache-ant-1.8.1/lib/ant-jmf.jar:/root/apache-ant-1.8.1/lib/ant-apache-bsf.jar:/root/apache-ant-1.8.1/lib/ant-apache-regexp.jar:/root/apache-ant-1.8.1/lib/ant-javamail.jar:/root/apache-ant-1.8.1/lib/ant-apache-oro.jar:/root/apache-ant-1.8.1/lib/ant-junit.jar:/root/apache-ant-1.8.1/lib/ant-launcher.jar:/root/apache-ant-1.8.1/lib/ant-commons-net.jar:/root/apache-ant-1.8.1/lib/ant-jdepend.jar:/root/apache-ant-1.8.1/lib/ant.jar:/root/apache-ant-1.8.1/lib/ant-nodeps.jar:/root/apache-ant-1.8.1/lib/ant-antlr.jar:/root/apache-ant-1.8.1/lib/ant-swing.jar:/root/apache-ant-1.8.1/lib/ant-apache-log4j.jar:/root/apache-ant-1.8.1/lib/ant-commons-logging.jar:/root/apache-ant-1.8.1/lib/ant-apache-xalan2.jar:/root/apache-ant-1.8.1/lib/ant-jai.jar:/root/apache-ant-1.8.1/lib/ant-apache-bcel.jar:/root/apache-ant-1.8.1/lib/ant-apache-resolver.jar:/root/apache-ant-1.8.1/lib/ant-jsch.jar:/root/apache-ant-1.8.1/lib/ant-testutil.jar:/root/apache-ant-1.8.1/lib/ant-netrexx.jar:/usr/java/jdk1.6.0_20/lib/tools.jar
java.vm.specification.version : 1.0
sun.arch.data.model : 64
java.home : /usr/java/jdk1.6.0_20/jre
java.specification.vendor : Sun Microsystems Inc.
user.language : en
java.vm.info : mixed mode
java.version : 1.6.0_20
java.ext.dirs : /usr/java/jdk1.6.0_20/jre/lib/ext:/usr/java/packages/lib/ext
sun.boot.class.path :
/usr/java/jdk1.6.0_20/jre/lib/resources.jar:/usr/java/jdk1.6.0_20/jre/lib/rt.jar:/usr/java/jdk1.6.0_20/jre/lib/sunrsasign.jar:/usr/java/jdk1.6.0_20/jre/lib/jsse.jar:/usr/java/jdk1.6.0_20/jre/lib/jce.jar:/usr/java/jdk1.6.0_20/jre/lib/charsets.jar:/usr/java/jdk1.6.0_20/jre/classes
java.vendor : Sun Microsystems Inc.
file.separator : /
java.vendor.url.bug : http://java.sun.com/cgi-bin/bugreport.cgi
sun.cpu.endian : little
sun.io.unicode.encoding : UnicodeLittle
sun.cpu.isalist :

-------------------------------------------
 Temp dir
-------------------------------------------
Temp dir is /tmp
Temp dir is writeable
Temp dir alignment with system clock is -246 ms

-------------------------------------------
 Locale information
-------------------------------------------
Timezone Central Standard Time offset=-18000000

-------------------------------------------
 Proxy information
-------------------------------------------
Java1.5+ proxy settings:
Direct connection

**** ant -diagnostics on master ****
------- Ant diagnostics report -------
Apache Ant version 1.8.1 compiled on April 30 2010

-------------------------------------------
 Implementation Version
-------------------------------------------
core tasks     : 1.8.1 in file:/root/apache-ant-1.8.1/lib/ant.jar
optional tasks : 1.8.1 in file:/root/apache-ant-1.8.1/lib/ant-nodeps.jar

-------------------------------------------
 ANT PROPERTIES
-------------------------------------------
ant.version: Apache Ant version 1.8.1 compiled on April 30 2010
ant.java.version: 1.6
ant.core.lib: /root/apache-ant-1.8.1/lib/ant.jar
ant.home: /root/apache-ant-1.8.1

-------------------------------------------
 ANT_HOME/lib jar listing
-------------------------------------------
ant.home: /root/apache-ant-1.8.1
ant-testutil.jar (15201 bytes)
ant-jmf.jar (6740 bytes)
ant-swing.jar (7556 bytes)
ant-apache-bsf.jar (3932 bytes)
ant-antlr.jar (5749 bytes)
ant-commons-net.jar (85346 bytes)
ant.jar (1514270 bytes)
ant-apache-log4j.jar (3060 bytes)
ant-apache-regexp.jar (3762 bytes)
ant-apache-xalan2.jar (2296 bytes)
ant-netrexx.jar (10393 bytes)
ant-jai.jar (22266 bytes)
ant-junit.jar (97539 bytes)
ant-javamail.jar (7897 bytes)
ant-launcher.jar (12312 bytes)
ant-jsch.jar (40200 bytes)
ant-jdepend.jar (8289 bytes)
ant-apache-resolver.jar (4085 bytes)
ant-nodeps.jar (409133 bytes)
ant-apache-bcel.jar (8747 bytes)
ant-apache-oro.jar (39639 bytes)
ant-commons-logging.jar (3914 bytes)

-------------------------------------------
 USER_HOME/.ant/lib jar listing
-------------------------------------------
user.home: /root
No such directory.

-------------------------------------------
 Tasks availability
-------------------------------------------
image : Missing dependency javax.media.jai.PlanarImage
sshexec : Missing dependency com.jcraft.jsch.Logger
wlrun : Not Available (the implementation class is not present)
scp : Missing dependency com.jcraft.jsch.Logger
stlist : Not Available (the implementation class is not present)
sshsession : Missing dependency com.jcraft.jsch.Logger
starteam : Not Available (the implementation class is not present)
stlabel : Not Available (the implementation class is not present)
jdepend : Missing dependency jdepend.xmlui.JDepend
stcheckin : Not Available (the implementation class is not present)
stcheckout : Not Available (the implementation class is not present)
ejbc : Not Available (the implementation class is not present)
wlstop : Not Available (the implementation class is not present)
ddcreator : Not Available (the implementation class is not present)
A task being missing/unavailable should only matter if you are trying to use
it

-------------------------------------------
 org.apache.env.Which diagnostics
-------------------------------------------
Not available.
Download it at http://xml.apache.org/commons/

-------------------------------------------
 XML Parser information
-------------------------------------------
XML Parser : com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl
XML Parser Location: unknown
Namespace-aware parser :
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Namespace-aware parser Location: unknown

-------------------------------------------
 XSLT Processor information
-------------------------------------------
XSLT Processor :
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl
XSLT Processor Location: unknown

-------------------------------------------
 System properties
-------------------------------------------
java.runtime.name : Java(TM) SE Runtime Environment
sun.boot.library.path : /usr/java/jdk1.6.0_20/jre/lib/amd64
java.vm.version : 16.3-b01
ant.library.dir : /root/apache-ant-1.8.1/lib
java.vm.vendor : Sun Microsystems Inc.
java.vendor.url : http://java.sun.com/
path.separator : :
java.vm.name : Java HotSpot(TM) 64-Bit Server VM
file.encoding.pkg : sun.io
user.country : US
sun.java.launcher : SUN_STANDARD
sun.os.patch.level : unknown
java.vm.specification.name : Java Virtual Machine Specification
user.dir : /root/apache-ant-1.8.1/bin
java.runtime.version : 1.6.0_20-b02
java.awt.graphicsenv : sun.awt.X11GraphicsEnvironment
java.endorsed.dirs : /usr/java/jdk1.6.0_20/jre/lib/endorsed
os.arch : amd64
java.io.tmpdir : /tmp
line.separator :

java.vm.specification.vendor : Sun Microsystems Inc.
os.name : Linux
ant.home : /root/apache-ant-1.8.1
sun.jnu.encoding : UTF-8
java.library.path :
/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:/usr/java/jdk1.6.0_20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
java.specification.name : Java Platform API Specification
java.class.version : 50.0
sun.management.compiler : HotSpot 64-Bit Server Compiler
os.version : 2.6.18-128.el5
user.home : /root
user.timezone : America/Chicago
java.awt.printerjob : sun.print.PSPrinterJob
file.encoding : UTF-8
java.specification.version : 1.6
user.name : root
java.class.path :
/root/apache-ant-1.8.1/lib/ant-launcher.jar:/root/apache-ant-1.8.1/lib/ant-testutil.jar:/root/apache-ant-1.8.1/lib/ant-jmf.jar:/root/apache-ant-1.8.1/lib/ant-swing.jar:/root/apache-ant-1.8.1/lib/ant-apache-bsf.jar:/root/apache-ant-1.8.1/lib/ant-antlr.jar:/root/apache-ant-1.8.1/lib/ant-commons-net.jar:/root/apache-ant-1.8.1/lib/ant.jar:/root/apache-ant-1.8.1/lib/ant-apache-log4j.jar:/root/apache-ant-1.8.1/lib/ant-apache-regexp.jar:/root/apache-ant-1.8.1/lib/ant-apache-xalan2.jar:/root/apache-ant-1.8.1/lib/ant-netrexx.jar:/root/apache-ant-1.8.1/lib/ant-jai.jar:/root/apache-ant-1.8.1/lib/ant-junit.jar:/root/apache-ant-1.8.1/lib/ant-javamail.jar:/root/apache-ant-1.8.1/lib/ant-launcher.jar:/root/apache-ant-1.8.1/lib/ant-jsch.jar:/root/apache-ant-1.8.1/lib/ant-jdepend.jar:/root/apache-ant-1.8.1/lib/ant-apache-resolver.jar:/root/apache-ant-1.8.1/lib/ant-nodeps.jar:/root/apache-ant-1.8.1/lib/ant-apache-bcel.jar:/root/apache-ant-1.8.1/lib/ant-apache-oro.jar:/root/apache-ant-1.8.1/lib/ant-commons-logging.jar:/usr/java/jdk1.6.0_20/lib/tools.jar
java.vm.specification.version : 1.0
sun.arch.data.model : 64
java.home : /usr/java/jdk1.6.0_20/jre
java.specification.vendor : Sun Microsystems Inc.
user.language : en
java.vm.info : mixed mode
java.version : 1.6.0_20
java.ext.dirs : /usr/java/jdk1.6.0_20/jre/lib/ext:/usr/java/packages/lib/ext
sun.boot.class.path :
/usr/java/jdk1.6.0_20/jre/lib/resources.jar:/usr/java/jdk1.6.0_20/jre/lib/rt.jar:/usr/java/jdk1.6.0_20/jre/lib/sunrsasign.jar:/usr/java/jdk1.6.0_20/jre/lib/jsse.jar:/usr/java/jdk1.6.0_20/jre/lib/jce.jar:/usr/java/jdk1.6.0_20/jre/lib/charsets.jar:/usr/java/jdk1.6.0_20/jre/classes
java.vendor : Sun Microsystems Inc.
file.separator : /
java.vendor.url.bug : http://java.sun.com/cgi-bin/bugreport.cgi
sun.cpu.endian : little
sun.io.unicode.encoding : UnicodeLittle
sun.cpu.isalist :

-------------------------------------------
 Temp dir
-------------------------------------------
Temp dir is /tmp
Temp dir is writeable
Temp dir alignment with system clock is -354 ms

-------------------------------------------
 Locale information
-------------------------------------------
Timezone Central Standard Time offset=-18000000

-------------------------------------------
 Proxy information
-------------------------------------------
Java1.5+ proxy settings:
Direct connection

Erik


On 26 May 2010 10:20, Steve Loughran <st...@apache.org> wrote:

> Erik Test wrote:
>
>> I'm able to ssh and ping the from the slave node to the master node w/o
>> problems. I've open the ports on the master node to receive communication
>> from the slave nodes on the port but still no luck. I'm going to try
>> opening
>> the port on the slave node to communicate with the master node next.
>>
>> The platform I'm on is Red Hat Enterprise Linux 5.
>> Erik
>>
>
> Stick ant on the machine, run ant -diagnostics. This gives us a view of
> what java sees of the world, which can be different from the rest of unix
>
>

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Steve Loughran <st...@apache.org>.
Erik Test wrote:
> I'm able to ssh and ping the from the slave node to the master node w/o
> problems. I've open the ports on the master node to receive communication
> from the slave nodes on the port but still no luck. I'm going to try opening
> the port on the slave node to communicate with the master node next.
> 
> The platform I'm on is Red Hat Enterprise Linux 5.
> Erik

Stick ant on the machine, run ant -diagnostics. This gives us a view of 
what java sees of the world, which can be different from the rest of unix


Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Erik Test <er...@gmail.com>.
I'm able to ssh and ping the from the slave node to the master node w/o
problems. I've open the ports on the master node to receive communication
from the slave nodes on the port but still no luck. I'm going to try opening
the port on the slave node to communicate with the master node next.

The platform I'm on is Red Hat Enterprise Linux 5.
Erik


On 26 May 2010 05:56, Steve Loughran <st...@apache.org> wrote:

> Hemanth Yamijala wrote:
>
>> Erik,
>>
>>  I've been unable to resolve this problem on my own so I've decided to ask
>>> for help. I've pasted the logs I have for the DataNode on of the slave
>>> nodes. The logs for TaskTracker are essentially the same (i.e. same
>>> exception causing a shutdown).
>>>
>>> Any suggestions or hints as to what could be causing this problem?
>>>
>>> STARTUP_MSG: Starting DataNode
>>> STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
>>> STARTUP_MSG:   args = []
>>> STARTUP_MSG:   version = 0.20.2
>>> STARTUP_MSG:   build =
>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>>>
>>
> that looks to me like the DN doesn't know who it is or where; check it's
> networking
>

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 26, 2010, at 10:12 AM, Erik Test wrote:

> [PROBLEM SOLVED]
> 
> I'm running on an internal network so I shutdown the iptables on two
> internal nodes. I was able to run a node as a slave and another as a master.
> 

Hadoop has a very bad tendency towards opening up random ephemeral ports and expecting to serve content off of them.  Your best bet is to allow all connections between Hadoop nodes and likely any clients.

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Erik Test <er...@gmail.com>.
[PROBLEM SOLVED]

I'm running on an internal network so I shutdown the iptables on two
internal nodes. I was able to run a node as a slave and another as a master.


Thanks!
Erik


On 26 May 2010 13:01, Eric Sammer <es...@cloudera.com> wrote:

> On Wed, May 26, 2010 at 12:38 PM, Erik Test <er...@gmail.com> wrote:
> > I confirmed that the hostname for the machine in the /etc/hosts file
> points
> > to the actual address of the machine and not the local loopback.
>
> Excellent.
>
> > However, I see that the ports reported in the log file are not available
> in
> > the iptables. I'm new to configuring iptables (i.e. I made my first
> > configuration changes yesterday) so do I configure the port on the slave
> > node as an output chain going to the master node?
>
> It helps to think of it in terms of the direction the packets flow on
> initial connect. In other words, DNs connect to the NN 8020 (or
> whatever you've configured) for instance, so you'd open the INPUT
> chain for 8020 on the NN. You almost always want to allow related and
> established connections. Before making significant changes to iptables
> (as you can lock yourself out of a box) you should absolutely become
> more familiar with it. Check out http://netfilter.org/ for all of the
> details and HOWTOs.
>
> In the interim, if your Hadoop nodes are on a trusted internal
> network, you may want to make sure things work without iptables first,
> then add it when you have rules that make sense.
>
> Regards.
> --
> Eric Sammer
> phone: +1-917-287-2675
> twitter: esammer
> data: www.cloudera.com
>

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Eric Sammer <es...@cloudera.com>.
On Wed, May 26, 2010 at 12:38 PM, Erik Test <er...@gmail.com> wrote:
> I confirmed that the hostname for the machine in the /etc/hosts file points
> to the actual address of the machine and not the local loopback.

Excellent.

> However, I see that the ports reported in the log file are not available in
> the iptables. I'm new to configuring iptables (i.e. I made my first
> configuration changes yesterday) so do I configure the port on the slave
> node as an output chain going to the master node?

It helps to think of it in terms of the direction the packets flow on
initial connect. In other words, DNs connect to the NN 8020 (or
whatever you've configured) for instance, so you'd open the INPUT
chain for 8020 on the NN. You almost always want to allow related and
established connections. Before making significant changes to iptables
(as you can lock yourself out of a box) you should absolutely become
more familiar with it. Check out http://netfilter.org/ for all of the
details and HOWTOs.

In the interim, if your Hadoop nodes are on a trusted internal
network, you may want to make sure things work without iptables first,
then add it when you have rules that make sense.

Regards.
-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Steve Loughran <st...@apache.org>.
Erik Test wrote:
> I confirmed that the hostname for the machine in the /etc/hosts file points
> to the actual address of the machine and not the local loopback.
> 
> However, I see that the ports reported in the log file are not available in
> the iptables. I'm new to configuring iptables (i.e. I made my first
> configuration changes yesterday) so do I configure the port on the slave
> node as an output chain going to the master node?
> Erik

I know nothing about IPtables either. I do know a bad /etc/resolv.conf 
file breaks a lot of java.

nothing obviously bad stands out from the ant diagnostics call, though 
it needs more network debugging. Java version seems OK

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Erik Test <er...@gmail.com>.
I confirmed that the hostname for the machine in the /etc/hosts file points
to the actual address of the machine and not the local loopback.

However, I see that the ports reported in the log file are not available in
the iptables. I'm new to configuring iptables (i.e. I made my first
configuration changes yesterday) so do I configure the port on the slave
node as an output chain going to the master node?
Erik


On 26 May 2010 12:27, Eric Sammer <es...@cloudera.com> wrote:

> This is usually due to a misconfigured hosts file. Make sure the
> hostname of the machine (reported by 'hostname') does not appear in
> the loopback line in /etc/hosts. This is something RHEL really likes
> to do and is bad. Confirm that iptables is not running or the proper
> ports are allowed between hosts.
>
> On Wed, May 26, 2010 at 5:56 AM, Steve Loughran <st...@apache.org> wrote:
> > Hemanth Yamijala wrote:
> >>
> >> Erik,
> >>
> >>> I've been unable to resolve this problem on my own so I've decided to
> ask
> >>> for help. I've pasted the logs I have for the DataNode on of the slave
> >>> nodes. The logs for TaskTracker are essentially the same (i.e. same
> >>> exception causing a shutdown).
> >>>
> >>> Any suggestions or hints as to what could be causing this problem?
> >>>
> >>> STARTUP_MSG: Starting DataNode
> >>> STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
> >>> STARTUP_MSG:   args = []
> >>> STARTUP_MSG:   version = 0.20.2
> >>> STARTUP_MSG:   build =
> >>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> >
> > that looks to me like the DN doesn't know who it is or where; check it's
> > networking
> >
>
>
>
> --
> Eric Sammer
> phone: +1-917-287-2675
> twitter: esammer
> data: www.cloudera.com
>

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Eric Sammer <es...@cloudera.com>.
This is usually due to a misconfigured hosts file. Make sure the
hostname of the machine (reported by 'hostname') does not appear in
the loopback line in /etc/hosts. This is something RHEL really likes
to do and is bad. Confirm that iptables is not running or the proper
ports are allowed between hosts.

On Wed, May 26, 2010 at 5:56 AM, Steve Loughran <st...@apache.org> wrote:
> Hemanth Yamijala wrote:
>>
>> Erik,
>>
>>> I've been unable to resolve this problem on my own so I've decided to ask
>>> for help. I've pasted the logs I have for the DataNode on of the slave
>>> nodes. The logs for TaskTracker are essentially the same (i.e. same
>>> exception causing a shutdown).
>>>
>>> Any suggestions or hints as to what could be causing this problem?
>>>
>>> STARTUP_MSG: Starting DataNode
>>> STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
>>> STARTUP_MSG:   args = []
>>> STARTUP_MSG:   version = 0.20.2
>>> STARTUP_MSG:   build =
>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>
> that looks to me like the DN doesn't know who it is or where; check it's
> networking
>



-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Steve Loughran <st...@apache.org>.
Hemanth Yamijala wrote:
> Erik,
> 
>> I've been unable to resolve this problem on my own so I've decided to ask
>> for help. I've pasted the logs I have for the DataNode on of the slave
>> nodes. The logs for TaskTracker are essentially the same (i.e. same
>> exception causing a shutdown).
>>
>> Any suggestions or hints as to what could be causing this problem?
>>
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.20.2
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r

that looks to me like the DN doesn't know who it is or where; check it's 
networking

Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)

Posted by Hemanth Yamijala <yh...@gmail.com>.
Erik,

>
> I've been unable to resolve this problem on my own so I've decided to ask
> for help. I've pasted the logs I have for the DataNode on of the slave
> nodes. The logs for TaskTracker are essentially the same (i.e. same
> exception causing a shutdown).
>
> Any suggestions or hints as to what could be causing this problem?
>
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> 2010-05-25 13:59:56,504 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 0 time(s).
> 2010-05-25 13:59:57,506 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 1 time(s).
> 2010-05-25 13:59:58,508 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 2 time(s).
> 2010-05-25 13:59:59,510 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 3 time(s).
> 2010-05-25 14:00:00,512 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 4 time(s).
> 2010-05-25 14:00:01,514 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 5 time(s).
> 2010-05-25 14:00:02,516 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 6 time(s).
> 2010-05-25 14:00:03,518 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 7 time(s).
> 2010-05-25 14:00:04,520 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 8 time(s).
> 2010-05-25 14:00:05,522 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master1/192.168.151.1:54310. Already tried 9 time(s).
> 2010-05-25 14:00:05,525 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call
> to master1/192.168.151.1:54310 failed on local exception:
> java.net.NoRouteToHostException: No route to host

Can you connect to 192.168.151.1 from the host running your datanode -
outside of Hadoop ? It seems like it is not reachable. Also, what
platform are you running this on ?


>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy4.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383)
>        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314)
>        at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:269)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
> Caused by: java.net.NoRouteToHostException: No route to host
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>        at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
>        at
> org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
>        at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
>        at org.apache.hadoop.ipc.Client.call(Client.java:720)
>        ... 13 more
>
> 2010-05-25 14:00:05,526 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down DataNode at localhost.localdomain/127.0.0.1
>
>
> Erik
>