You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brian Cho <ch...@gmail.com> on 2008/12/18 00:22:03 UTC

java.nio.channels.ClosedSelectorException

Hi,

I've set up a Hadoop cluster but have a problem where multiple
datanodes and tasks stop responding. I first ran into the problem
using 0.19.0, but I also see the problem at 0.18.2. Java version is
1.6.0_11.

Looking at the logs, the first sign of trouble seems to be either
java.nio.channels.ClosedSelectorException or java.io.IOException:
Filesystem closed. I've attached the "last 4KB" from failed map tasks
taken from the web interface (0.18.2), and the end of a snippet of a
datanode log (0.19.0).

Task nodes stop responding after the exception. The datanode does not
fail right away, but rather continues to raise null pointer exceptions
while failing to communicate with the namenode. Eventually the
namenode decides the datanode is Dead. The exceptions are not raised
every time, but are fairly frequent (for example, after running 5
straight m/r jobs, I had 8 failed datanodes). Also, we've seen this
happen on a 16-node setup, as well as a 4-node setup (different
hardware).

Has anyone run into similar problems before? What would be your
suggestion to try and troubleshoot this? I'm pretty much stumped as to
why this is happening.

Thank you,
Brian Cho




====


Task Logs: 'attempt_200812161652_0002_m_000003_0'


stdout logs



stderr logs

Exception in thread
"org.apache.hadoop.dfs.DFSClient$LeaseChecker@a9ae05"
java.nio.channels.ClosedSelectorException
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:66)
	at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
	at sun.nio.ch.Util.releaseTemporarySelector(Util.java:135)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
	at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
	at org.apache.hadoop.ipc.Client.call(Client.java:704)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
	at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
	at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:782)
	at java.lang.Thread.run(Thread.java:619)



syslog logs

2008-12-16 16:54:16,767 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-12-16 16:54:16,886 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2008-12-16 17:09:35,889 WARN org.apache.hadoop.mapred.TaskRunner:
Parent died.  Exiting attempt_200812161652_0002_m_000003_0








Task Logs: 'attempt_200812161652_0002_m_000003_0'


stdout logs



stderr logs

Exception in thread
"org.apache.hadoop.dfs.DFSClient$LeaseChecker@a9ae05"
java.nio.channels.ClosedSelectorException
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:66)
	at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
	at sun.nio.ch.Util.releaseTemporarySelector(Util.java:135)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
	at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
	at org.apache.hadoop.ipc.Client.call(Client.java:704)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
	at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at org.apache.hadoop.dfs.$Proxy1.renewLease(Unknown Source)
	at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:782)
	at java.lang.Thread.run(Thread.java:619)



syslog logs

2008-12-16 16:54:16,767 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-12-16 16:54:16,886 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2008-12-16 17:09:35,889 WARN org.apache.hadoop.mapred.TaskRunner:
Parent died.  Exiting attempt_200812161652_0002_m_000003_0







Task Logs: 'attempt_200812161652_0002_m_000001_1'


stdout logs



stderr logs



syslog logs

2008-12-16 17:03:57,696 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-12-16 17:03:57,840 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2008-12-16 17:06:13,095 WARN org.apache.hadoop.mapred.TaskRunner:
Parent died.  Exiting attempt_200812161652_0002_m_000001_1
2008-12-16 17:06:36,064 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.IOException: Filesystem closed
	at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
	at org.apache.hadoop.dfs.DFSClient.access$600(DFSClient.java:59)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2689)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2655)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
	at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:962)
	at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:73)
	at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:270)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:232)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)












DATANODE:








2008-12-14 17:04:10,493 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8593963756771034982_1588 src: /10.1.1.5:39494 dest:
/10.1.2.2:50010
2008-12-14 17:04:11,662 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception:
java.nio.channels.ClosedSelectorException
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:66)
        at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
        at sun.nio.ch.Util.releaseTemporarySelector(Util.java:135)
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
        at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
	at org.apache.hadoop.ipc.Client.call(Client.java:685)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy4.blockReceived(Unknown Source)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:703)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1100)
        at java.lang.Thread.run(Thread.java:619)

2008-12-14 17:04:14,708 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/10.1.3.4:53498, dest: /10.1.2.2:50010, bytes: 67108864, op:
HDFS_WRITE, cliID: DFSClient_attempt_200812141659_\
0001_r_000003_0, srvID:
DS-1046032264-155.98.39.98-50010-1229273103400, blockid:
blk_-6861247923839756276_1586
2008-12-14 17:04:14,911 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for
block blk_-6861247923839756276_1586 terminating
2008-12-14 17:04:15,784 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_474395800075063500_1588 src: /10.1.2.2:54720 dest: /10.1.2.2:50010
2008-12-14 17:04:16,664 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: using
BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2008-12-14 17:04:16,665 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception:
java.lang.NullPointerException
        at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459)
        at org.apache.hadoop.ipc.Client.call(Client.java:686)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy4.sendHeartbeat(Unknown Source)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:669)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1100)
        at java.lang.Thread.run(Thread.java:619)

(the null pointer exception is raised periodically every five seconds)