You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by alo alt <wg...@googlemail.com> on 2012/02/01 12:53:56 UTC
Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster?
- Alex
--
Alexander Lorenz
http://mapredit.blogspot.com
On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:
>
> hi Alex,
>
> I'm using jre 1.6.0_24
>
> with hadoop 0.20.0
> hive 0.80
>
> thx
>
>
> 2012/2/1 alo alt <wg...@googlemail.com>
> Hi,
>
> + hdfs-user (bcc'd)
>
> which jre version u use?
>
> - Alex
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
>
>> hi ,
>>
>>
>> I'm using hive to do some log analysis, and I have encountered a problem.
>>
>> My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
>>
>> One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs
>>
>> the logs look like:
>>
>> 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
>> 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
>> 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383
>> 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted.
>> 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384
>> 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted.
>> 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385
>> 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted.
>>
>> this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
>>
>> to solve this problem, I have to restart the cluster.
>> but obviously, this is not a good solution.
>>
>> these jobs eventually will be run on the other tasktracker, and they will run well, the job will success.
>>
>> has anybody have encountered this problem and give me some advices?
>>
>> and occasionally there will be some errlog like:
>>
>> 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcher.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>> at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>> at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>> at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>> at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
>> 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
>> 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334
>> 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcher.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>> at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>> at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>> at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>> at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
>>
>> Is there some connections between these two errors?
>>
>> thank you very much!
>>
>> xiaobin
>
>