You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "YCozy (Jira)" <ji...@apache.org> on 2020/06/15 19:43:00 UTC

[jira] [Created] (HDFS-15414) java.net.SocketException: Original Exception : java.io.IOException: Broken pipe

YCozy created HDFS-15414:
----------------------------

             Summary: java.net.SocketException: Original Exception : java.io.IOException: Broken pipe
                 Key: HDFS-15414
                 URL: https://issues.apache.org/jira/browse/HDFS-15414
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.10.0
            Reporter: YCozy


We observed this exception in a DataNode's log while we are not shutting down any nodes in the cluster. Specifically, we have a cluster with 3 DataNodes (DN1, DN2, DN3) and 2 NameNodes (NN1, NN2). At some point, this exception occurs in DN3's log:
{noformat}
2020-06-08 21:53:03,373 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:9666, datanodeUuid=4408ff04-e406-4ccc-bd5c-8516ad57ec21, infoPort=9664, infoSecurePort=0, ipcPort=9667, storageInfo=lv=-57;cid=CID-c816c4ea-a559-4fd5-9b3a-b5994dc3a5fa;nsid=34747155;c=1591653120007) Starting thread to transfer BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 to 127.0.0.1:9766
2020-06-08 21:53:03,373 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:9666, datanodeUuid=4408ff04-e406-4ccc-bd5c-8516ad57ec21, infoPort=9664, infoSecurePort=0, ipcPort=9667, storageInfo=lv=-57;cid=CID-c816c4ea-a559-4fd5-9b3a-b5994dc3a5fa;nsid=34747155;c=1591653120007) Starting thread to transfer BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 to 127.0.0.1:9766
2020-06-08 21:53:03,381 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /app/dn3/current
2020-06-08 21:53:03,383 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:9666, datanodeUuid=4408ff04-e406-4ccc-bd5c-8516ad57ec21, infoPort=9664, infoSecurePort=0, ipcPort=9667, storageInfo=lv=-57;cid=CID-c816c4ea-a559-4fd5-9b3a-b5994dc3a5fa;nsid=34747155;c=1591653120007):Failed to transfer BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 to 127.0.0.1:9766 got
java.net.SocketException: Original Exception : java.io.IOException: Broken pipe
    at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
    at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
    at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
    at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:605)
    at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223) 
    at org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:280) 
    at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:620) 
    at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:804) 
    at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:751) 
    at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2469) 
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Broken pipe 
    ... 11 more{noformat}
Port 9766 is DN2's address. 

Around the same time, we observe the following exceptions in DN2's log:
{noformat}
2020-06-08 21:53:03,379 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 src: /127.0.0.1:47618 dest: /127.0.0.1:9766
2020-06-08 21:53:03,379 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-553302063-172.17.0.3-    1591653120007:blk_1073741825_1002 already exists in state FINALIZED and thus cannot be created.
2020-06-08 21:53:03,379 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 007e9b383989:9766:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:47618 dst: /127.0.0.1:9766; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block     BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1002 already exists in state FINALIZED and thus cannot be created.{noformat}
However, this exception does look like the cause of the broken pipe because earlier DN2 has another occurrence of a ReplicaAlreadyExistsException, but DN3 only has one occurrence of broken pipe. Here's the other occurrence of ReplicaAlreadyExistsException on DN2:
{noformat}
2020-06-08 21:52:54,438 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1001 src: /127.0.0.1:47462 dest: /127.0.0.1:9766
2020-06-08 21:52:54,438 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1001 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-553302063-172.17.0.3-    1591653120007:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created.
2020-06-08 21:52:54,448 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 007e9b383989:9766:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:47462 dst: /127.0.0.1:9766; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block     BP-553302063-172.17.0.3-1591653120007:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created.{noformat}
So we think there is a bug causing the broken pipe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org