You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Kihwal Lee (JIRA)" <ji...@apache.org> on 2013/11/11 23:44:17 UTC
[jira] [Created] (HDFS-5500) Critical datanode threads may
terminate silently on uncaught exceptions
Kihwal Lee created HDFS-5500:
--------------------------------
Summary: Critical datanode threads may terminate silently on uncaught exceptions
Key: HDFS-5500
URL: https://issues.apache.org/jira/browse/HDFS-5500
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical
We've seen refreshUsed (DU) thread disappearing on uncaught exceptions. This can go unnoticed for a long time. If OOM occurs, more things can go wrong. On one occasion, Timer, multiple refreshUsed and DataXceiverServer thread had terminated.
DataXceiverServer catches OutOfMemoryError and sleeps for 30 seconds, but I am not sure it is really helpful. In once case, the thread did it multiple times then terminated. I suspect another OOM was thrown while in a catch block. As a result, the server socket was not closed and clients hung on connect. If it had at least closed the socket, client-side would have been impacted less.
--
This message was sent by Atlassian JIRA
(v6.1#6144)