You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Elliott Clark (JIRA)" <ji...@apache.org> on 2016/01/20 08:49:39 UTC
[jira] [Created] (HDFS-9669) TcpPeerServer should respect
ipc.server.listen.queue.size
Elliott Clark created HDFS-9669:
-----------------------------------
Summary: TcpPeerServer should respect ipc.server.listen.queue.size
Key: HDFS-9669
URL: https://issues.apache.org/jira/browse/HDFS-9669
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Elliott Clark
On periods of high traffic we are seeing:
{code}
16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect to /10.138.178.47:50010 for file /MYPATH/MYFILE for block BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException: Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
{code}
At the time that this happens there are way less xceivers than configured.
On most JDK's this will make 50 the total backlog at any time. This effectively means that any GC + Busy time willl result in tcp resets.
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)