You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2015/03/10 02:37:39 UTC
[jira] [Resolved] (HDFS-2182) Exceptions in DataXceiver#run can
result in a zombie datanode
[ https://issues.apache.org/jira/browse/HDFS-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HDFS-2182.
------------------------------------
Resolution: Later
closing this as stale
> Exceptions in DataXceiver#run can result in a zombie datanode
> --------------------------------------------------------------
>
> Key: HDFS-2182
> URL: https://issues.apache.org/jira/browse/HDFS-2182
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Eli Collins
> Assignee: Eli Collins
> Attachments: hdfs-2182-1.patch, hdfs-2182-2.patch
>
>
> DataXceiver#run currently swallows all exceptions, it should instead plumb them up to DataXceiverServer#run so it can decide whether the exception should be tolerated or the daemon should exit. An IOE should be tolerated (because it's likely just an issue with a particular thread, or an intermittent failure), as it is today, but eg j.l.Error should not.
> This came up in the following bug I'm seeing on a test cluster: if there's eg a NoClassDefFoundError thrown in DataXceiver#run (because the host jars were replaced out from underneath it, it ran out of descriptors, etc.) we'll end up with a datanode that is alive but always fails because it can't create any DataXceiver threads. In this case the datanode should shut itself down rather than continue to run.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)