You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by cho ju il <tj...@kgrid.co.kr> on 2014/10/10 08:21:38 UTC

Bug??? Under-Replicated Blocks.

hadoop 2.4.1
datanode disk failure.
'Number of Under-Replicated Blocks' is zero. 
After two disk failure will result in the loss of files. ( CORRUPT ) 
How do I fix it? 
 
 
1. dfshealth.html
Configured Capacity:	42.91 TB
DFS Used:	1.86 GB
Non DFS Used:	29.63 TB
DFS Remaining:	13.28 TB
DFS Used%:	0%
DFS Remaining%:	30.94%
Block Pool Used:	1.86 GB
Block Pool Used%:	0%
DataNodes usages% (Min/Median/Max/stdDev):	0.00% / 0.01% / 0.01% / 0.00%
Live Nodes	2 (Decommissioned: 0)
Dead Nodes	0 (Decommissioned: 0)
Decommissioning Nodes	0
Number of Under-Replicated Blocks	0
 
2. chmod 444 /raid0/data01 ( volume failure )
3. bin/hdfs dfs -get /t.mp4 /tmp/t4.mp4 ( read file )
4. namenode log ( volume failure )
2014-10-10 14:55:21,027 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Disk error on DatanodeRegistration(192.168.55.151, datanodeUuid=b565d54d-0817-4aa5-884e-1e060179f43f, infoPort=40075, ipcPort=40020, storageInfo=lv=-55;cid=CID-TEST-ZONE;nsid=326408948;c=0): DataNode failed volumes:/raid0/data01/dfs/data/current;
2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_1073741848_1024 on 192.168.55.151:40010 size 49940112 replicaState = FINALIZED
2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory blockUCState = COMPLETE
2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_1073741842_1018 on 192.168.55.151:40010 size 134217728 replicaState = FINALIZED
2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory blockUCState = COMPLETE
2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_1073741844_1020 on 192.168.55.151:40010 size 134217728 replicaState = FINALIZED
2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory blockUCState = COMPLETE
2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_1073741846_1022 on 192.168.55.151:40010 size 134217728 replicaState = FINALIZED
2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory blockUCState = COMPLETE
2014-10-10 14:55:25,431 INFO BlockStateChange: BLOCK* processReport: from storage DS-4de98631-ddec-4118-8654-2961b1815230 node DatanodeRegistration(192.168.55.151, datanodeUuid=b565d54d-0817-4aa5-884e-1e060179f43f, infoPort=40075, ipcPort=40020, storageInfo=lv=-55;cid=CID-TEST-ZONE;nsid=326408948;c=0), blocks: 4, processing time: 32 msecs
 
5. datanode log ( volume failure )
2014-10-10 14:55:21,473 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing failed volume /raid0/data01/dfs/data/current: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Can not create directory: /raid0/data01/dfs/data/current/BP-1269062812-127.0.0.1-1412645127175/current/finalized
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:91)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LDir.checkDirTree(LDir.java:160)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:255)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:209)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:168)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1317)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:1421)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1117)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:350)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:343)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:150)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.&lt;init&gt;(BlockSender.java:265)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
        at java.lang.Thread.run(Thread.java:662)
2014-10-10 14:55:21,491 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed to write dfsUsed to /raid0/data01/dfs/data/current/BP-1269062812-127.0.0.1-1412645127175/current/dfsUsed
java.io.FileNotFoundException: /raid0/data01/dfs/data/current/BP-1269062812-127.0.0.1-1412645127175/current/dfsUsed (Permission denied)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:194)
        at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:145)
        at java.io.FileWriter.&lt;init&gt;(FileWriter.java:73)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:213)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:424)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:252)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:175)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1317)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:1421)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1117)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:350)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:343)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:150)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.&lt;init&gt;(BlockSender.java:265)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
        at java.lang.Thread.run(Thread.java:662)
2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Completed checkDirs. Removed 1 volumes. Current volumes: [/raid0/data02/dfs/data/current]
2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1269062812-127.0.0.1-1412645127175:1073741841 on failed volume /raid0/data01/dfs/data/current
2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1269062812-127.0.0.1-1412645127175:1073741843 on failed volume /raid0/data01/dfs/data/current
2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1269062812-127.0.0.1-1412645127175:1073741845 on failed volume /raid0/data01/dfs/data/current
2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1269062812-127.0.0.1-1412645127175:1073741847 on failed volume /raid0/data01/dfs/data/current
2014-10-10 14:55:21,495 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removed 4 out of 8(took 0 millisecs)
2014-10-10 14:55:21,495 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode.handleDiskError: Keep Running: true
2014-10-10 14:55:22,414 DEBUG org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: b=blk_1073741841_1017, f=/raid0/data01/dfs/data/current/BP-1269062812-127.0.0.1-1412645127175/current/finalized/blk_1073741841
2014-10-10 14:55:22,414 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock BP-1269062812-127.0.0.1-1412645127175:blk_1073741841_1017 received exception java.io.IOException: Block blk_1073741841_1017 is not valid.
2014-10-10 14:55:22,449 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.55.151, datanodeUuid=b565d54d-0817-4aa5-884e-1e060179f43f, infoPort=40075, ipcPort=40020, storageInfo=lv=-55;cid=CID-TEST-ZONE;nsid=326408948;c=0):Got exception while serving BP-1269062812-127.0.0.1-1412645127175:blk_1073741841_1017 to /192.168.55.151:53669
java.io.IOException: Block blk_1073741841_1017 is not valid.
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:352)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:343)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:150)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.&lt;init&gt;(BlockSender.java:265)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
        at java.lang.Thread.run(Thread.java:662)
2014-10-10 14:55:22,449 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: namenode02:40010:DataXceiver error processing READ_BLOCK operation  src: /192.168.55.151:53669 dst: /192.168.55.151:40010
java.io.IOException: Block blk_1073741841_1017 is not valid.
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:352)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:343)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:150)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.&lt;init&gt;(BlockSender.java:265)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
        at java.lang.Thread.run(Thread.java:662)
 
6. dfshealth.html
Configured Capacity:	42.91 TB
DFS Used:	1.36 GB
Non DFS Used:	29.62 TB
DFS Remaining:	13.28 TB
DFS Used%:	0%
DFS Remaining%:	30.96%
Block Pool Used:	1.36 GB
Block Pool Used%:	0%
DataNodes usages% (Min/Median/Max/stdDev):	0.00% / 0.00% / 0.00% / 0.00%
Live Nodes	2 (Decommissioned: 0)
Dead Nodes	0 (Decommissioned: 0)
Decommissioning Nodes	0
Number of Under-Replicated Blocks	0