You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "PengZhang (JIRA)" <ji...@apache.org> on 2013/04/02 08:53:15 UTC
[jira] [Created] (HDFS-4660) Duplicated checksum on DN in a
recovered pipeline
PengZhang created HDFS-4660:
-------------------------------
Summary: Duplicated checksum on DN in a recovered pipeline
Key: HDFS-4660
URL: https://issues.apache.org/jira/browse/HDFS-4660
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.0.3-alpha
Reporter: PengZhang
Priority: Critical
pipeline DN1 DN2 DN3
stop DN2
Add a node DN4 in 2nd position in pipeline
DN1 DN4 DN3
recover RBW
DN4 after recover rbw
2013-04-01 21:02:31,570 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004
2013-04-01 21:02:31,570 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
getNumBytes() = 134144
getBytesOnDisk() = 134144
getVisibleLength()= 134144
end at chunk (134144/512=262)
DN3 after recover rbw
2013-04-01 21:02:31,575 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 21:02:31,575 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
getNumBytes() = 134028
getBytesOnDisk() = 134028
getVisibleLength()= 134028
client send packet after recover pipeline
offset=133632 len=1008
DN4 after flush
2013-04-01 21:02:31,779 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file offset:134640; meta offset:1063
// meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is 1063.
DN3 after flush
2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, lastPacketInBlock=false, offsetInBlock=134640, ackEnqueueNanoTime=8817026136871545)
2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing meta file offset of block BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from 1055 to 1051
2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file offset:134640; meta offset:1059
After checking meta on DN4, I found checksum of chunk 262 is duplicated, but data not.
Later after block was finalized, DN4's scanner detected bad block, and then reported it to NM. NM send a command to delete this block, and replicate this block from other DN in pipeline to satisfy duplication num.
I think this is because in BlockReceiver it skips data bytes already written, but not skips checksum bytes already written. And function adjustCrcFilePosition is only used for last non-completed chunk, but
not for this situation.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira