You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by divye sheth <di...@gmail.com> on 2014/04/29 13:18:11 UTC

Issues while decommissioning node

Hi Experts,

I am decommissioning one of my nodes from the cluster. All the blocks get
replicated properly to the other nodes to maintain the replication factor
except one. I get the following exception for the block:

*Source Datanode (One being decommissioned):*

2014-04-29 07:08:31,619 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(1X.X.X.XX:50010,
storageID=DS-567173478-1X.X.X.XX-50010-1366295899368, infoPort=50075,
ipcPort=50020):Failed to transfer blk_-8120977448166465461_891134 to
1X.X.X.YYY:50010 got java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:323)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:435)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1177)
        at java.lang.Thread.run(Thread.java:662)

*Destination Datanode (where block is supposed to be replicated):*

2014-04-29 07:07:24,179 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-8120977448166465461_891134 received exception
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block
blk_-8120977448166465461_891134 has already been started (though not
completed), and thus cannot be created.
2014-04-29 07:07:24,179 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(1X.X.X.YYY:50010,
storageID=DS-1396119779-1X.X.X.YYY-50010-1388728482530, infoPort=50075,
ipcPort=50020):DataXceiver
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block
blk_-8120977448166465461_891134 has already been started (though not
completed), and thus cannot be created.
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1229)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:99)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
        at java.lang.Thread.run(Thread.java:662)
2014-04-29 07:07:34,329 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_2476742220921569826_901106
2014-04-29 07:07:43,929 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-8387585272893559369_854112
2014-04-29 07:07:52,329 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_5961493296385433904_858037
2014-04-29 07:08:50,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8120977448166465461_891134 src: /1X.X.X.XX:37100 dest:
/1X.X.X.YYY:50010
2014-04-29 07:08:50,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-8120977448166465461_891134 received exception
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block
blk_-8120977448166465461_891134 has already been started (though not
completed), and thus cannot be created.
2014-04-29 07:08:50,305 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(1X.X.X.YYY:50010,
storageID=DS-1396119779-1X.X.X.YYY-50010-1388728482530, infoPort=50075,
ipcPort=50020):DataXceiver
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block
blk_-8120977448166465461_891134 has already been started (though not
completed), and thus cannot be created.
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1229)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:99)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
        at java.lang.Thread.run(Thread.java:662)


How do I overcome these errors? The block is available in the other
locations and fsck shows the cluster to be in a healthy state.

I am using Hadoop-0.20-append-r1056497, we are upgrading to the latest but
till the time we upgrade would really appreciate any pointers to solve this
issue.

Thanks
Divye Sheth