You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Ja Sam <pt...@gmail.com> on 2015/06/24 11:24:54 UTC
Hadoop doesn't work after restart
I had a running Hadoop cluster (version 2.2.0.2.0.6.0-76 from Hortonworks).
Yesterday a lot of things happened nad in some point of time we decided to
one by one reboot all datanodes. Unfortunate the operator did monitor the
namenode health monitor.
The result of above operation is that all datanodes shows as dead nodes,
all blocked are lost, ... .
In one datanode which we decided to reboot it once again to see if datanode
will log anything interesting. The log finished with informations:
INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting
INFO ipc.Server (Server.java:run(688)) - IPC Server listener on 8010: starting
and hangs here. In the same time on namnode I can see only two types of
messages:
INFO hdfs.StateChange (FSNamesystem.java:completeFile(2805)) - DIR*
completeFile: [SOME PATH] is closed by
DFSClient_NONMAPREDUCE_288661168_33
and a lot of:
WARN blockmanagement.BlockManager
(PendingReplicationBlocks.java:pendingReplicationCheck(249)) -
PendingReplicationMonitor timed out blk_1074405820_668233
Today we decided to restart name node and all data nodes. After restart
website: http://[server]:50070/dfshealth.jspanswers VERY slow. I don't see
any errors in log except 5 like bellow:
ERROR datanode.DataNode (DataXceiver.java:run(225)) -
maelhd21:50010:DataXceiver error processing WRITE_BLOCK operation
src: /node1:33470 dest: /node3:50010
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
BP-1037132819-192.168.61.196-1409328081083:blk_1075994366_2257020 already
exists in state FINALIZED and thus cannot be created.
3 out of 5 nodes shows as lived, but refresh of hadoop status page takes
more than 10 minutes.
The question of course is: what should I check or do now?
p.s. I asked same question on StackOverflow:
http://stackoverflow.com/questions/31020877/datanodes-are-cannot-connect-to-namenode
Re: Hadoop doesn't work after restart
Posted by hadoop hive <ha...@gmail.com>.
Try running fsck
On Wed, Jun 24, 2015 at 2:54 PM, Ja Sam <pt...@gmail.com> wrote:
> I had a running Hadoop cluster (version 2.2.0.2.0.6.0-76 from
> Hortonworks). Yesterday a lot of things happened nad in some point of time
> we decided to one by one reboot all datanodes. Unfortunate the operator did
> monitor the namenode health monitor.
>
> The result of above operation is that all datanodes shows as dead nodes,
> all blocked are lost, ... .
>
> In one datanode which we decided to reboot it once again to see if
> datanode will log anything interesting. The log finished with informations:
>
> INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting
> INFO ipc.Server (Server.java:run(688)) - IPC Server listener on 8010: starting
>
> and hangs here. In the same time on namnode I can see only two types of
> messages:
>
> INFO hdfs.StateChange (FSNamesystem.java:completeFile(2805)) - DIR* completeFile: [SOME PATH] is closed by DFSClient_NONMAPREDUCE_288661168_33
>
> and a lot of:
>
> WARN blockmanagement.BlockManager (PendingReplicationBlocks.java:pendingReplicationCheck(249)) - PendingReplicationMonitor timed out blk_1074405820_668233
>
> Today we decided to restart name node and all data nodes. After restart
> website: http://[server]:50070/dfshealth.jspanswers VERY slow. I don't
> see any errors in log except 5 like bellow:
>
> ERROR datanode.DataNode (DataXceiver.java:run(225)) - maelhd21:50010:DataXceiver error processing WRITE_BLOCK operation src: /node1:33470 dest: /node3:50010
>
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> Block BP-1037132819-192.168.61.196-1409328081083:blk_1075994366_2257020
> already exists in state FINALIZED and thus cannot be created.
>
> 3 out of 5 nodes shows as lived, but refresh of hadoop status page takes
> more than 10 minutes.
>
> The question of course is: what should I check or do now?
>
>
> p.s. I asked same question on StackOverflow:
> http://stackoverflow.com/questions/31020877/datanodes-are-cannot-connect-to-namenode
>
Re: Hadoop doesn't work after restart
Posted by hadoop hive <ha...@gmail.com>.
Try running fsck
On Wed, Jun 24, 2015 at 2:54 PM, Ja Sam <pt...@gmail.com> wrote:
> I had a running Hadoop cluster (version 2.2.0.2.0.6.0-76 from
> Hortonworks). Yesterday a lot of things happened nad in some point of time
> we decided to one by one reboot all datanodes. Unfortunate the operator did
> monitor the namenode health monitor.
>
> The result of above operation is that all datanodes shows as dead nodes,
> all blocked are lost, ... .
>
> In one datanode which we decided to reboot it once again to see if
> datanode will log anything interesting. The log finished with informations:
>
> INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting
> INFO ipc.Server (Server.java:run(688)) - IPC Server listener on 8010: starting
>
> and hangs here. In the same time on namnode I can see only two types of
> messages:
>
> INFO hdfs.StateChange (FSNamesystem.java:completeFile(2805)) - DIR* completeFile: [SOME PATH] is closed by DFSClient_NONMAPREDUCE_288661168_33
>
> and a lot of:
>
> WARN blockmanagement.BlockManager (PendingReplicationBlocks.java:pendingReplicationCheck(249)) - PendingReplicationMonitor timed out blk_1074405820_668233
>
> Today we decided to restart name node and all data nodes. After restart
> website: http://[server]:50070/dfshealth.jspanswers VERY slow. I don't
> see any errors in log except 5 like bellow:
>
> ERROR datanode.DataNode (DataXceiver.java:run(225)) - maelhd21:50010:DataXceiver error processing WRITE_BLOCK operation src: /node1:33470 dest: /node3:50010
>
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> Block BP-1037132819-192.168.61.196-1409328081083:blk_1075994366_2257020
> already exists in state FINALIZED and thus cannot be created.
>
> 3 out of 5 nodes shows as lived, but refresh of hadoop status page takes
> more than 10 minutes.
>
> The question of course is: what should I check or do now?
>
>
> p.s. I asked same question on StackOverflow:
> http://stackoverflow.com/questions/31020877/datanodes-are-cannot-connect-to-namenode
>
Re: Hadoop doesn't work after restart
Posted by hadoop hive <ha...@gmail.com>.
Try running fsck
On Wed, Jun 24, 2015 at 2:54 PM, Ja Sam <pt...@gmail.com> wrote:
> I had a running Hadoop cluster (version 2.2.0.2.0.6.0-76 from
> Hortonworks). Yesterday a lot of things happened nad in some point of time
> we decided to one by one reboot all datanodes. Unfortunate the operator did
> monitor the namenode health monitor.
>
> The result of above operation is that all datanodes shows as dead nodes,
> all blocked are lost, ... .
>
> In one datanode which we decided to reboot it once again to see if
> datanode will log anything interesting. The log finished with informations:
>
> INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting
> INFO ipc.Server (Server.java:run(688)) - IPC Server listener on 8010: starting
>
> and hangs here. In the same time on namnode I can see only two types of
> messages:
>
> INFO hdfs.StateChange (FSNamesystem.java:completeFile(2805)) - DIR* completeFile: [SOME PATH] is closed by DFSClient_NONMAPREDUCE_288661168_33
>
> and a lot of:
>
> WARN blockmanagement.BlockManager (PendingReplicationBlocks.java:pendingReplicationCheck(249)) - PendingReplicationMonitor timed out blk_1074405820_668233
>
> Today we decided to restart name node and all data nodes. After restart
> website: http://[server]:50070/dfshealth.jspanswers VERY slow. I don't
> see any errors in log except 5 like bellow:
>
> ERROR datanode.DataNode (DataXceiver.java:run(225)) - maelhd21:50010:DataXceiver error processing WRITE_BLOCK operation src: /node1:33470 dest: /node3:50010
>
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> Block BP-1037132819-192.168.61.196-1409328081083:blk_1075994366_2257020
> already exists in state FINALIZED and thus cannot be created.
>
> 3 out of 5 nodes shows as lived, but refresh of hadoop status page takes
> more than 10 minutes.
>
> The question of course is: what should I check or do now?
>
>
> p.s. I asked same question on StackOverflow:
> http://stackoverflow.com/questions/31020877/datanodes-are-cannot-connect-to-namenode
>
Re: Hadoop doesn't work after restart
Posted by hadoop hive <ha...@gmail.com>.
Try running fsck
On Wed, Jun 24, 2015 at 2:54 PM, Ja Sam <pt...@gmail.com> wrote:
> I had a running Hadoop cluster (version 2.2.0.2.0.6.0-76 from
> Hortonworks). Yesterday a lot of things happened nad in some point of time
> we decided to one by one reboot all datanodes. Unfortunate the operator did
> monitor the namenode health monitor.
>
> The result of above operation is that all datanodes shows as dead nodes,
> all blocked are lost, ... .
>
> In one datanode which we decided to reboot it once again to see if
> datanode will log anything interesting. The log finished with informations:
>
> INFO ipc.Server (Server.java:run(861)) - IPC Server Responder: starting
> INFO ipc.Server (Server.java:run(688)) - IPC Server listener on 8010: starting
>
> and hangs here. In the same time on namnode I can see only two types of
> messages:
>
> INFO hdfs.StateChange (FSNamesystem.java:completeFile(2805)) - DIR* completeFile: [SOME PATH] is closed by DFSClient_NONMAPREDUCE_288661168_33
>
> and a lot of:
>
> WARN blockmanagement.BlockManager (PendingReplicationBlocks.java:pendingReplicationCheck(249)) - PendingReplicationMonitor timed out blk_1074405820_668233
>
> Today we decided to restart name node and all data nodes. After restart
> website: http://[server]:50070/dfshealth.jspanswers VERY slow. I don't
> see any errors in log except 5 like bellow:
>
> ERROR datanode.DataNode (DataXceiver.java:run(225)) - maelhd21:50010:DataXceiver error processing WRITE_BLOCK operation src: /node1:33470 dest: /node3:50010
>
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> Block BP-1037132819-192.168.61.196-1409328081083:blk_1075994366_2257020
> already exists in state FINALIZED and thus cannot be created.
>
> 3 out of 5 nodes shows as lived, but refresh of hadoop status page takes
> more than 10 minutes.
>
> The question of course is: what should I check or do now?
>
>
> p.s. I asked same question on StackOverflow:
> http://stackoverflow.com/questions/31020877/datanodes-are-cannot-connect-to-namenode
>