You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Steven Xu <xu...@neusoft.com> on 2014/08/29 11:26:07 UTC

[HDFS] DFSClient does not closing a closed socket resulting in thousand of CLOSE_WAIT sockets with HDP 2.1/HBase 0.98.0/Hadoop/2.4.0

Hello Hadoopers,

When I run HDP 2.1/HBase 0.98.0/Hadoop/2.4.0, I always got the fatal
problem: DFSClient does not closing a closed socket resulting in thousand of
CLOSE_WAIT sockets. Have you guys got same issue, if that please share to
me? Thanks a lot. I also create a issue HDFS-6973 for this.

 

HBase as HDFS Client dose not close a dead connection with the datanode.
This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not
connect to the datanode because too many mapped sockets from one host to
another on the same port:50010. 
After I restart all RSs, the count of CLOSE_WAIT will increase always.
$ netstat -an|grep CLOSE_WAIT|wc -l
2545
netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
2545
ps -ef|grep 6569
hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java
-Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
-XX:+UseConcMarkSweepGC
I aslo have reviewed these issues:
HDFS-5697 <https://issues.apache.org/jira/browse/HDFS-5697> 
HDFS-5671 <https://issues.apache.org/jira/browse/HDFS-5671> 
HDFS-1836 <https://issues.apache.org/jira/browse/HDFS-1836> 
 <https://issues.apache.org/jira/browse/HBASE-9393> HBASE-9393
I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been
added.
But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue.
Please check. Thanks a lot.
These codes have been added into
BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my
problem,


BlockReaderFactory.java

 

// Some comments here

  private BlockReader getRemoteBlockReaderFromTcp() throws IOException {

    if (LOG.isTraceEnabled()) {

      LOG.trace(this + ": trying to create a remote block reader from a " +

          "TCP socket");

    }

    BlockReader blockReader = null;

    while (true) {

      BlockReaderPeer curPeer = null;

      Peer peer = null;

      try {

        curPeer = nextTcpPeer();

        if (curPeer == null) break;

        if (curPeer.fromCache) remainingCacheTries--;

        peer = curPeer.peer;

        blockReader = getRemoteBlockReader(peer);

        return blockReader;

      } catch (IOException ioe) {

        if (isSecurityException(ioe)) {

          if (LOG.isTraceEnabled()) {

            LOG.trace(this + ": got security exception while constructing "
+

                "a remote block reader from " + peer, ioe);

          }

          throw ioe;

        }

        if ((curPeer != null) && curPeer.fromCache) {

          // Handle an I/O error we got when using a cached peer.  These are

          // considered less serious, because the underlying socket may be

          // stale.

          if (LOG.isDebugEnabled()) {

            LOG.debug("Closed potentially stale remote peer " + peer, ioe);

          }

        } else {

          // Handle an I/O error we got when using a newly created peer.

          LOG.warn("I/O error constructing remote block reader.", ioe);

          throw ioe;

        }

      } finally {

        if (blockReader == null) {

          IOUtils.cleanup(LOG, peer);

        }

      }

    }

    return null;

  }

 

---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------