You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Rakesh R <ra...@huawei.com> on 2014/11/26 06:54:55 UTC

DFS#close is waiting long time for AckedSeqno response..

Hi,

I could see the dfsclient#close is getting into timed_wait for long time and is not come out.
When analyzing the issue, I could see that the dfsclient fails to communicate with the datanode. The reason for the failure is my KDC server is down for some time. The other side the dfsclient is infinitely waiting for the ackSeqNo.

I feel, rather than entering into an infinite waiting it could wait for configurable amount of time and close the client. What others opinion?

      synchronized (dataQueue) {
        while (!closed) {
          checkClosed();
          if (lastAckedSeqno >= seqno) {
            break;
          }
          try {
            dataQueue.wait(1000); // when we receive an ack, we notify on
                                  // dataQueue
          } catch (InterruptedException ie) {
            throw new InterruptedIOException(
                "Interrupted while waiting for data to be acknowledged by pipeline");
          }
        }

"pool-8-thread-1" prio=10 tid=0x0000000000b66800 nid=0x59bc in Object.wait() [0x00007fde12b74000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2034)
                - locked <0x00000006386da3b0> (a java.util.LinkedList)
                at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2019)
                at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2111)
                - locked <0x00000006384b37b0> (a org.apache.hadoop.hdfs.DFSOutputStream)
                at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:856)
                at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:873)
                - locked <0x00000006382fce40> (a org.apache.hadoop.hdfs.DFSClient)
                at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:857)
                at com.huawei.dpa.hdfs.utils.AbstractConn.close(AbstractConn.java:67)
                at com.huawei.dpa.hdfs.utils.HdfsConnPool.updateConnPool(HdfsConnPool.java:149)
                - locked <0x0000000638515288> (a com.huawei.dpa.hdfs.utils.HdfsConnPool)
                at com.huawei.dpa.hdfs.utils.HdfsConnPool.access$100(HdfsConnPool.java:14)
                at com.huawei.dpa.hdfs.utils.HdfsConnPool$1.run(HdfsConnPool.java:137)


Anyone phase similar kind of issues. Appreciate any help. Thanks!


Thanks,
Rakesh