You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Gordon Wang (JIRA)" <ji...@apache.org> on 2014/08/01 07:38:39 UTC

[jira] [Created] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"

Gordon Wang created HDFS-6804:
---------------------------------

             Summary: race condition between transferring block and appending block causes "Unexpected checksum mismatch exception" 
                 Key: HDFS-6804
                 URL: https://issues.apache.org/jira/browse/HDFS-6804
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.2.0
            Reporter: Gordon Wang


We found some error log in the datanode. like this
{noformat}
2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex
ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
java.io.IOException: Terminating due to a checksum error.java.io.IOException: Unexpected checksum mismatch while writing BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from /192.168.2.101:39495
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
        at java.lang.Thread.run(Thread.java:744)
{noformat}
While on the source datanode, the log says the block is transmitted.
{noformat}
2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Da
taTransfer: Transmitted BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
_9248 (numBytes=16188152) to /192.168.2.103:50010
{noformat}

When the destination datanode gets the checksum mismatch, it reports bad block to NameNode and NameNode marks the replica on the source datanode as corrupt. But actually, the replica on the source datanode is valid. Because the replica can pass the checksum verification.

In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)