You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Sean Busbey (JIRA)" <ji...@apache.org> on 2014/07/03 21:26:33 UTC

[jira] [Created] (ACCUMULO-2976) blacklist problematic tservers

Sean Busbey created ACCUMULO-2976:
-------------------------------------

             Summary: blacklist problematic tservers
                 Key: ACCUMULO-2976
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2976
             Project: Accumulo
          Issue Type: Improvement
          Components: master
            Reporter: Sean Busbey
            Priority: Minor


It would be nice if the master kept track of tservers that misbehave and eventually blacklisted them, similar to how HDFS handles datanodes and MapReduce/YARN handle trackers.

Right now the closest we do is having the Master killing the zoolock for tservers that are behaving poorly. This causes them to exit if they're not in a zombie state.

On deployments with a watchdog that relaunches failed processes, this doesn't help much because the tserver comes back. In the case of i.e. flakey network failures for the node this just means repeating the process and impacting cluster performance while the master works out that it should kill the node again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)