You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2018/01/11 10:22:00 UTC

[jira] [Created] (HDFS-13010) DataNode: Listen queue is always 128

Gopal V created HDFS-13010:
------------------------------

             Summary: DataNode: Listen queue is always 128
                 Key: HDFS-13010
                 URL: https://issues.apache.org/jira/browse/HDFS-13010
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 3.0.0
            Reporter: Gopal V


DFS write-heavy workloads are failing with 

{code}
18/01/11 05:02:34 INFO mapreduce.Job: Task Id : attempt_1515660475578_0007_m_000387_0, Status : FAILED
Error: java.io.IOException: Could not get block locations. Source file "/tmp/tpcds-generate/10000/_temporary/1/_temporary/attempt_1515660475578_0007_m_000387_0/inventory/data-m-00387" - Aborting...block==null
        at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1477)
        at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
{code}

This was tracked to 

{code}
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
        at org.apache.hadoop.hdfs.DataStreamer$StreamerStreams.<init>(DataStreamer.java:162)
        at org.apache.hadoop.hdfs.DataStreamer.transfer(DataStreamer.java:1450)
        at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1407)
        at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1598)
        at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1499)
        at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
        at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667)
{code}

{code}
# ss -tl | grep 50010

LISTEN     0      128        *:50010                    *:*   
{code}

However, the system is configured with a much higher somaxconn

{code}
# sysctl -a | grep somaxconn

net.core.somaxconn = 16000
{code}

Yet, the SNMP counters show connections being refused with {{127 times the listen queue of a socket overflowed}}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org