You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Kihwal Lee (JIRA)" <ji...@apache.org> on 2013/11/11 23:44:17 UTC

[jira] [Created] (HDFS-5500) Critical datanode threads may terminate silently on uncaught exceptions

Kihwal Lee created HDFS-5500:
--------------------------------

             Summary: Critical datanode threads may terminate silently on uncaught exceptions
                 Key: HDFS-5500
                 URL: https://issues.apache.org/jira/browse/HDFS-5500
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Kihwal Lee
            Priority: Critical


We've seen refreshUsed (DU) thread disappearing on uncaught exceptions. This can go unnoticed for a long time.  If OOM occurs, more things can go wrong.  On one occasion, Timer, multiple refreshUsed and DataXceiverServer thread had terminated.  

DataXceiverServer catches OutOfMemoryError and sleeps for 30 seconds, but I am not sure it is really helpful. In once case, the thread did it multiple times then terminated. I suspect another OOM was thrown while in a catch block.  As a result, the server socket was not closed and clients hung on connect. If it had at least closed the socket, client-side would have been impacted less.



--
This message was sent by Atlassian JIRA
(v6.1#6144)