You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Boudnik (JIRA)" <ji...@apache.org> on 2010/09/29 18:08:34 UTC

[jira] Created: (HADOOP-6979) System test framework needs to black list unresponsive cluster nodes after a timeout

System test framework needs to black list unresponsive cluster nodes after a timeout 
-------------------------------------------------------------------------------------

                 Key: HADOOP-6979
                 URL: https://issues.apache.org/jira/browse/HADOOP-6979
             Project: Hadoop Common
          Issue Type: Improvement
          Components: test
    Affects Versions: 0.22.0
            Reporter: Konstantin Boudnik


Sometimes one or more nodes in a cluster deployed for system testing purposes might become unresponsive (hw failure, Hadoop daemon crashes, etc.). In the current implementation, Herriot will be trying to connect to such a node(s) forever or until a timeout will occur. Instead, an unresponsive node should be places into a blacklist and the framework has to move on.

A cluster should be declared unusable if NN or JT are placed on the blacklist, or if a certain percentage of DNs (TTs) were blacklisted. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.