You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by jiang licht <li...@yahoo.com> on 2010/02/25 21:17:43 UTC

Hadoop freeze?

I ran into the following problem running a hadoop job written in pig.Pls help check what caused the issue. As I could tell, it seems to me the job/task tracker failed for some reason but 
name/data nodes still functioning. 

The job simply seems to make no progress at all (no output, no log). But couple of other hadoop jobs ran successfully before this one. hadoop fs -ls can still list files. But I did "Hadoop job -list", it took too long and then failed with error message as follows.

Exception in thread "main" java.io.IOException: Call to hostname/ip-address:50002 failed on
 local exception: Connection reset by peer	at 
org.apache.hadoop.ipc.Client.call(Client.java:699)	at 
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)	at 
org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)	at 
org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)	at 
org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:435)	at 
org.apache.hadoop.mapred.JobClient.init(JobClient.java:429)	at 
org.apache.hadoop.mapred.JobClient.run(JobClient.java:1512)	at 
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)	at 
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)	at 
org.apache.hadoop.mapred.JobClient.main(JobClient.java:1727)Caused
 by: java.io.IOException: Connection reset by peer	at 
sun.nio.ch.FileDispatcher.read0(Native Method)	at 
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)	at 
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)	at 
sun.nio.ch.IOUtil.read(IOUtil.java:206)	at 
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)	at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)	at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)	at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)	at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)	at 
java.io.FilterInputStream.read(FilterInputStream.java:116)	at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:271)	at 
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)	at 
java.io.BufferedInputStream.read(BufferedInputStream.java:237)	at 
java.io.DataInputStream.readInt(DataInputStream.java:370)	at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:493)	at 
org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)
Web interface to job tracker@50030 simply came with no response at all.

By checking netstat, sometimes it shows 50030 and sometimes not. connections and ports with data nodes were shown there.

Then, if I ran another pig, it failed with the following error:

Error before Pig is launched----------------------------ERROR
 6009: Failed to create job client:Call to hostname/ip-address:50002 failed on
 local exception: Connection reset by peer
org.apache.pig.backend.executionengine.ExecException:
 ERROR 6009: Failed to create job client:Call to hostname/ip-address:50002 failed on
 local exception: Connection reset by peer	at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:217)	at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:137)	at 
org.apache.pig.impl.PigContext.connect(PigContext.java:199)	at 
org.apache.pig.PigServer.<init>(PigServer.java:169)	at 
org.apache.pig.PigServer.<init>(PigServer.java:158)	at 
org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:54)	at 
org.apache.pig.Main.main(Main.java:395)Caused by: 
java.io.IOException: Call to hostname/ip-address:50002 failed on
 local exception: Connection reset by peer	at 
org.apache.hadoop.ipc.Client.call(Client.java:699)	at 
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)	at 
org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source)	at 
org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)	at 
org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:435)	at 
org.apache.hadoop.mapred.JobClient.init(JobClient.java:429)	at 
org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:398)	at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:212)	... 6 moreCaused
 by: java.io.IOException: Connection reset by peer	at 
sun.nio.ch.FileDispatcher.read0(Native Method)	at 
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)	at 
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)	at 
sun.nio.ch.IOUtil.read(IOUtil.java:206)	at 
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)	at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)	at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)	at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)	at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)	at 
java.io.FilterInputStream.read(FilterInputStream.java:116)	at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:271)	at 
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)	at 
java.io.BufferedInputStream.read(BufferedInputStream.java:237)	at 
java.io.DataInputStream.readInt(DataInputStream.java:370)	at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:493)	at 
org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)================================================================================

Thank,

Michael