You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Kamil Rogoń <ka...@cantstopgames.com> on 2012/08/16 11:06:23 UTC

Under-Replicated Blocks

Hello

Sometimes I get small glitch with replication between hdfs nodes. 
Datanodes are online, but one of them is hanging.

Default replication factor:   3
Average block replication:    2.9940512
Corrupt blocks:               0
Missing replicas:             163 (0.19868357 %)
Number of data-nodes:         3
Number of racks:              1

Node      Last Contact      Admin State      Blocks  Failed Volumes
hdfs1                0       In Service 27471                   0
hdfs2                2       In Service 27305                   0
hdfs3                2       In Service       27401       0


As you see number of blocks is not equal.

Generaly datanodes are working:

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block 
blk_8238726012137032582_1388695 file 
/home/hdfs/3/data/current/subdir62/subdir31/blk_8238726012137032582 for 
deletion
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block 
blk_8238726012137032582_1388695 at file 
/home/hdfs/3/data/current/subdir62/subdir31/blk_8238726012137032582

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting 
asynchronous block report scan
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished 
asynchronous block report scan in 72ms
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled 
asynchronous block report against current state in 8 ms
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 
27401 blocks got processed in 72 msecs
INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: 
Verification succeeded for blk_-8356262741701254215_854916

On namenode logs I only see many lines like this:

WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to 
place enough replicas, still in need of 1(excluded: 192.168.0.101:50010, 
192.168.0.102:50010, 192.168.0.103:50010)


Restarting datanodes helps, but what is the reason?

Thanks for any tips,
k.