You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Xiaobin She <xi...@gmail.com> on 2012/02/01 08:16:53 UTC

tasktracker keep recevied KillJobAction and then delete unknown job while using hive

hi ,


I'm using hive to do some log analysis, and I have encountered a problem.

My cluster have 3 nodes, one for NameNode/JobTracker and the other two for
DataNode/TaskTracker

One of the tasktracker will repeatedly receive KillJobAction and then
delete unknown jobs

the logs look like:

2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201201301055_0381
2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201201301055_0381 being deleted.
2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201201301055_0383
2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201201301055_0383 being deleted.
2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201201301055_0384
2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201201301055_0384 being deleted.
2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201201301055_0385
2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201201301055_0385 being deleted.

this happens occasionally, and if this happens, this tasktracker will do
notghing but keep receiveing KillJobAction and delete unknown job, and thus
the performance will drop down.

to solve this problem, I have to restart the cluster.
but obviously, this is not a good solution.

these jobs eventually will be run on the other tasktracker, and they will
run well, the job will success.

has anybody have encountered this problem and give me some advices?

and occasionally there will be some errlog like:

2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 55837: readAndProcess threw exception java.io.IOException:
Connection reset by peer. Count of bytes read: 0
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
        at sun.nio.ch.IOUtil.read(IOUtil.java:175)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
        at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
        at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
        at
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing
unknown JVM jvm_201201311041_0071_r_-386575334
2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 55837: readAndProcess threw exception java.io.IOException:
Connection reset by peer. Count of bytes read: 0
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
        at sun.nio.ch.IOUtil.read(IOUtil.java:175)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
        at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
        at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
        at
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)

Is there some connections between these two errors?

thank you very much!

xiaobin

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by alo alt <wg...@googlemail.com>.

How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster?

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:

> 
> hi Alex,
> 
> I'm using jre 1.6.0_24
> 
> with hadoop 0.20.0
> hive 0.80
> 
> thx
> 
> 
> 2012/2/1 alo alt <wg...@googlemail.com>
> Hi,
> 
> + hdfs-user (bcc'd)
> 
> which jre version u use?
> 
> - Alex
> 
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> 
>> hi ,
>> 
>> 
>> I'm using hive to do some log analysis, and I have encountered a problem.
>> 
>> My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
>> 
>> One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs
>> 
>> the logs look like:
>> 
>> 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
>> 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
>> 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383
>> 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted.
>> 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384
>> 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted.
>> 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385
>> 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted.
>> 
>> this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
>> 
>> to solve this problem, I have to restart the cluster.
>> but obviously, this is not a good solution.
>> 
>> these jobs eventually will be run on the other tasktracker, and they will run well, the job will success.
>> 
>> has anybody have encountered this problem and give me some advices?
>> 
>> and occasionally there will be some errlog like:
>> 
>> 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcher.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>>        at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>>        at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>>        at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>>        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>>        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
>> 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
>> 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334
>> 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcher.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>>        at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>>        at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>>        at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>>        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>>        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
>> 
>> Is there some connections between these two errors?
>> 
>> thank you very much!
>> 
>> xiaobin
> 
>

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by Xiaobin She <xi...@gmail.com>.

hi Alex,

it seems that the reason why that particular failes is because that the
disk space is not enouth in that machine.

Once I clean up some disk space, the problem disappear.

But still I don't understand why.

thx

xiaobin

2012/2/2 alo alt <wg...@googlemail.com>

> A not well written job can easy overload a TaskTracker. The first question
> is,  why one TT has no problems and the other has. Take a look at that node
> in the logs. Did you see messages like "0 slots free" the handler count
> could you help.
>
> dfs.namenode.handler.count can be set to 15 or similar. 10 is very
> moderate.
>
> best,
>  Alex
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> On Feb 1, 2012, at 4:11 PM, Xiaobin She wrote:
>
> > hi Alex,
> >
> > I did not set the value of dfs.namenode.handler.count in the config
> file, so it shoule be the default value, like 10.
> >
> > I only have two datanodes, 10 is not enough ?
> >
> > And if it is not enough , why the tasktracker will keep receiveing
> KillJobAction and delete unknown job?
> >
> > thank you very much for your help!
> >
> > 2012/2/1 alo alt <wg...@googlemail.com>
> > How much namenode handler (dfs.namenode.handler.count) you have defined
> for your cluster?
> >
> > - Alex
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:
> >
> > >
> > > hi Alex,
> > >
> > > I'm using jre 1.6.0_24
> > >
> > > with hadoop 0.20.0
> > > hive 0.80
> > >
> > > thx
> > >
> > >
> > > 2012/2/1 alo alt <wg...@googlemail.com>
> > > Hi,
> > >
> > > + hdfs-user (bcc'd)
> > >
> > > which jre version u use?
> > >
> > > - Alex
> > >
> > > --
> > > Alexander Lorenz
> > > http://mapredit.blogspot.com
> > >
> > > On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> > >
> > > > hi ,
> > > >
> > > >
> > > > I'm using hive to do some log analysis, and I have encountered a
> problem.
> > > >
> > > > My cluster have 3 nodes, one for NameNode/JobTracker and the other
> two for DataNode/TaskTracker
> > > >
> > > > One of the tasktracker will repeatedly receive KillJobAction and
> then delete unknown jobs
> > > >
> > > > the logs look like:
> > > >
> > > > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0381
> > > > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0381 being deleted.
> > > > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0383
> > > > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0383 being deleted.
> > > > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0384
> > > > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0384 being deleted.
> > > > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0385
> > > > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0385 being deleted.
> > > >
> > > > this happens occasionally, and if this happens, this tasktracker
> will do notghing but keep receiveing KillJobAction and delete unknown job,
> and thus the performance will drop down.
> > > >
> > > > to solve this problem, I have to restart the cluster.
> > > > but obviously, this is not a good solution.
> > > >
> > > > these jobs eventually will be run on the other tasktracker, and they
> will run well, the job will success.
> > > >
> > > > has anybody have encountered this problem and give me some advices?
> > > >
> > > > and occasionally there will be some errlog like:
> > > >
> > > > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC
> Server listener on 55837: readAndProcess threw exception
> java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > > > java.io.IOException: Connection reset by peer
> > > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > > >         at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > > >         at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > > >         at
> org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > > > 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager:
> JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
> > > > 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker:
> Killing unknown JVM jvm_201201311041_0071_r_-386575334
> > > > 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC
> Server listener on 55837: readAndProcess threw exception
> java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > > > java.io.IOException: Connection reset by peer
> > > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > > >         at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > > >         at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > > >         at
> org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > > >
> > > > Is there some connections between these two errors?
> > > >
> > > > thank you very much!
> > > >
> > > > xiaobin
> > >
> > >
> >
> >
>
>

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by alo alt <wg...@googlemail.com>.

A not well written job can easy overload a TaskTracker. The first question is,  why one TT has no problems and the other has. Take a look at that node in the logs. Did you see messages like "0 slots free" the handler count could you help.

dfs.namenode.handler.count can be set to 15 or similar. 10 is very moderate.

best,
 Alex  

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 4:11 PM, Xiaobin She wrote:

> hi Alex,
> 
> I did not set the value of dfs.namenode.handler.count in the config file, so it shoule be the default value, like 10.
> 
> I only have two datanodes, 10 is not enough ? 
> 
> And if it is not enough , why the tasktracker will keep receiveing KillJobAction and delete unknown job?
> 
> thank you very much for your help!
> 
> 2012/2/1 alo alt <wg...@googlemail.com>
> How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster?
> 
> - Alex
> 
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:
> 
> >
> > hi Alex,
> >
> > I'm using jre 1.6.0_24
> >
> > with hadoop 0.20.0
> > hive 0.80
> >
> > thx
> >
> >
> > 2012/2/1 alo alt <wg...@googlemail.com>
> > Hi,
> >
> > + hdfs-user (bcc'd)
> >
> > which jre version u use?
> >
> > - Alex
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> >
> > > hi ,
> > >
> > >
> > > I'm using hive to do some log analysis, and I have encountered a problem.
> > >
> > > My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
> > >
> > > One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs
> > >
> > > the logs look like:
> > >
> > > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
> > > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
> > > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383
> > > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted.
> > > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384
> > > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted.
> > > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385
> > > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted.
> > >
> > > this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
> > >
> > > to solve this problem, I have to restart the cluster.
> > > but obviously, this is not a good solution.
> > >
> > > these jobs eventually will be run on the other tasktracker, and they will run well, the job will success.
> > >
> > > has anybody have encountered this problem and give me some advices?
> > >
> > > and occasionally there will be some errlog like:
> > >
> > > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > >         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > >         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > > 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
> > > 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334
> > > 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > >         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > >         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > >
> > > Is there some connections between these two errors?
> > >
> > > thank you very much!
> > >
> > > xiaobin
> >
> >
> 
>

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by Xiaobin She <xi...@gmail.com>.

hi Alex,

I did not set the value of dfs.namenode.handler.count in the config file,
so it shoule be the default value, like 10.

I only have two datanodes, 10 is not enough ?

And if it is not enough , why the tasktracker will keep receiveing
KillJobAction and delete unknown job?

thank you very much for your help!

2012/2/1 alo alt <wg...@googlemail.com>

> How much namenode handler (dfs.namenode.handler.count) you have defined
> for your cluster?
>
> - Alex
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:
>
> >
> > hi Alex,
> >
> > I'm using jre 1.6.0_24
> >
> > with hadoop 0.20.0
> > hive 0.80
> >
> > thx
> >
> >
> > 2012/2/1 alo alt <wg...@googlemail.com>
> > Hi,
> >
> > + hdfs-user (bcc'd)
> >
> > which jre version u use?
> >
> > - Alex
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> >
> > > hi ,
> > >
> > >
> > > I'm using hive to do some log analysis, and I have encountered a
> problem.
> > >
> > > My cluster have 3 nodes, one for NameNode/JobTracker and the other two
> for DataNode/TaskTracker
> > >
> > > One of the tasktracker will repeatedly receive KillJobAction and then
> delete unknown jobs
> > >
> > > the logs look like:
> > >
> > > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0381
> > > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0381 being deleted.
> > > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0383
> > > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0383 being deleted.
> > > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0384
> > > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0384 being deleted.
> > > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0385
> > > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0385 being deleted.
> > >
> > > this happens occasionally, and if this happens, this tasktracker will
> do notghing but keep receiveing KillJobAction and delete unknown job, and
> thus the performance will drop down.
> > >
> > > to solve this problem, I have to restart the cluster.
> > > but obviously, this is not a good solution.
> > >
> > > these jobs eventually will be run on the other tasktracker, and they
> will run well, the job will success.
> > >
> > > has anybody have encountered this problem and give me some advices?
> > >
> > > and occasionally there will be some errlog like:
> > >
> > > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 55837: readAndProcess threw exception java.io.IOException:
> Connection reset by peer. Count of bytes read: 0
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > >         at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > >         at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > >         at
> org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > > 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM
> : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
> > > 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker:
> Killing unknown JVM jvm_201201311041_0071_r_-386575334
> > > 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 55837: readAndProcess threw exception java.io.IOException:
> Connection reset by peer. Count of bytes read: 0
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > >         at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > >         at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > >         at
> org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > >
> > > Is there some connections between these two errors?
> > >
> > > thank you very much!
> > >
> > > xiaobin
> >
> >
>
>

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by alo alt <wg...@googlemail.com>.

How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster?

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:

> 
> hi Alex,
> 
> I'm using jre 1.6.0_24
> 
> with hadoop 0.20.0
> hive 0.80
> 
> thx
> 
> 
> 2012/2/1 alo alt <wg...@googlemail.com>
> Hi,
> 
> + hdfs-user (bcc'd)
> 
> which jre version u use?
> 
> - Alex
> 
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> 
> > hi ,
> >
> >
> > I'm using hive to do some log analysis, and I have encountered a problem.
> >
> > My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
> >
> > One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs
> >
> > the logs look like:
> >
> > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
> > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
> > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383
> > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted.
> > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384
> > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted.
> > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385
> > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted.
> >
> > this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
> >
> > to solve this problem, I have to restart the cluster.
> > but obviously, this is not a good solution.
> >
> > these jobs eventually will be run on the other tasktracker, and they will run well, the job will success.
> >
> > has anybody have encountered this problem and give me some advices?
> >
> > and occasionally there will be some errlog like:
> >
> > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> >         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> >         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
> > 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334
> > 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> >         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> >         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> >
> > Is there some connections between these two errors?
> >
> > thank you very much!
> >
> > xiaobin
> 
>

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by alo alt <wg...@googlemail.com>.

How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster?

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:

> 
> hi Alex,
> 
> I'm using jre 1.6.0_24
> 
> with hadoop 0.20.0
> hive 0.80
> 
> thx
> 
> 
> 2012/2/1 alo alt <wg...@googlemail.com>
> Hi,
> 
> + hdfs-user (bcc'd)
> 
> which jre version u use?
> 
> - Alex
> 
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> 
>> hi ,
>> 
>> 
>> I'm using hive to do some log analysis, and I have encountered a problem.
>> 
>> My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
>> 
>> One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs
>> 
>> the logs look like:
>> 
>> 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
>> 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
>> 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383
>> 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted.
>> 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384
>> 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted.
>> 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385
>> 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted.
>> 
>> this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
>> 
>> to solve this problem, I have to restart the cluster.
>> but obviously, this is not a good solution.
>> 
>> these jobs eventually will be run on the other tasktracker, and they will run well, the job will success.
>> 
>> has anybody have encountered this problem and give me some advices?
>> 
>> and occasionally there will be some errlog like:
>> 
>> 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcher.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>>        at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>>        at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>>        at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>>        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>>        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
>> 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
>> 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334
>> 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcher.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>>        at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>>        at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>>        at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>>        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>>        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
>> 
>> Is there some connections between these two errors?
>> 
>> thank you very much!
>> 
>> xiaobin
> 
>

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by Xiaobin She <xi...@gmail.com>.

hi Alex,

I'm using jre 1.6.0_24

with hadoop 0.20.0
hive 0.80

thx


2012/2/1 alo alt <wg...@googlemail.com>

> Hi,
>
> + hdfs-user (bcc'd)
>
> which jre version u use?
>
> - Alex
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
>
> > hi ,
> >
> >
> > I'm using hive to do some log analysis, and I have encountered a problem.
> >
> > My cluster have 3 nodes, one for NameNode/JobTracker and the other two
> for DataNode/TaskTracker
> >
> > One of the tasktracker will repeatedly receive KillJobAction and then
> delete unknown jobs
> >
> > the logs look like:
> >
> > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0381
> > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0381 being deleted.
> > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0383
> > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0383 being deleted.
> > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0384
> > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0384 being deleted.
> > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0385
> > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0385 being deleted.
> >
> > this happens occasionally, and if this happens, this tasktracker will do
> notghing but keep receiveing KillJobAction and delete unknown job, and thus
> the performance will drop down.
> >
> > to solve this problem, I have to restart the cluster.
> > but obviously, this is not a good solution.
> >
> > these jobs eventually will be run on the other tasktracker, and they
> will run well, the job will success.
> >
> > has anybody have encountered this problem and give me some advices?
> >
> > and occasionally there will be some errlog like:
> >
> > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 55837: readAndProcess threw exception java.io.IOException:
> Connection reset by peer. Count of bytes read: 0
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> >         at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> >         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
> > 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker:
> Killing unknown JVM jvm_201201311041_0071_r_-386575334
> > 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 55837: readAndProcess threw exception java.io.IOException:
> Connection reset by peer. Count of bytes read: 0
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> >         at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> >         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> >
> > Is there some connections between these two errors?
> >
> > thank you very much!
> >
> > xiaobin
>
>

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by alo alt <wg...@googlemail.com>.

Hi,

+ hdfs-user (bcc'd)

which jre version u use?

- Alex  

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:

> hi ,
> 
> 
> I'm using hive to do some log analysis, and I have encountered a problem.
> 
> My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
> 
> One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs
> 
> the logs look like:
> 
> 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
> 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
> 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383
> 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted.
> 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384
> 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted.
> 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385
> 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted.  
> 
> this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
> 
> to solve this problem, I have to restart the cluster.
> but obviously, this is not a good solution.
> 
> these jobs eventually will be run on the other tasktracker, and they will run well, the job will success.
> 
> has anybody have encountered this problem and give me some advices?
> 
> and occasionally there will be some errlog like:
> 
> 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
> 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334
> 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)  
> 
> Is there some connections between these two errors?
> 
> thank you very much!
> 
> xiaobin

Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

Posted by alo alt <wg...@googlemail.com>.

Hi,

+ hdfs-user (bcc'd)

which jre version u use?

- Alex  

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:

> hi ,
> 
> 
> I'm using hive to do some log analysis, and I have encountered a problem.
> 
> My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
> 
> One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs
> 
> the logs look like:
> 
> 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
> 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
> 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383
> 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted.
> 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384
> 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted.
> 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385
> 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted.  
> 
> this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
> 
> to solve this problem, I have to restart the cluster.
> but obviously, this is not a good solution.
> 
> these jobs eventually will be run on the other tasktracker, and they will run well, the job will success.
> 
> has anybody have encountered this problem and give me some advices?
> 
> and occasionally there will be some errlog like:
> 
> 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
> 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334
> 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)  
> 
> Is there some connections between these two errors?
> 
> thank you very much!
> 
> xiaobin