You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2013/11/19 12:01:22 UTC

[jira] [Commented] (HADOOP-10113) There are some threads which will be dead silently when uncaught exception/error occurs

    [ https://issues.apache.org/jira/browse/HADOOP-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826381#comment-13826381 ] 

Steve Loughran commented on HADOOP-10113:
-----------------------------------------

In a past project I've had a thread base class that would be set up to send a callback on completion (with any exception); with something like that here a handler could be set up in each of the parents & let them deal with it. They do need to differentiate planned exit from unplanned exit though, which is straightforward in a YARN service, less consistent for the others

> There are some threads which will be dead silently when uncaught exception/error occurs
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10113
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Kousuke Saruta
>             Fix For: 3.0.0
>
>
> Related to HDFS-5500, I found there are some threads be dead silently when uncaught exception/error occured.
> For example, following threads are I mentioned.
> * refreshUsed in DU
> * reloader in ReloadingX509TrustManager
> * t in UserGroupInformation#spawnAutoRenewalThreadForUserCreds
> * errThread in Shell#runCommand
> * sinkThread in MetricsSinkAdapter
> * blockScannerThread in DataBlockScanner
> * emptier in NameNode#startTrashEmptier (when we use TrashPolicyDefault) 
> There are some critical threads if we can't notice the dead (e.g DU). I think we should handle those exception/error, and monitor the liveness or log that.



--
This message was sent by Atlassian JIRA
(v6.1#6144)