You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2015/03/12 01:58:38 UTC

[jira] [Created] (YARN-3337) Provide YARN chaos monkey

Steve Loughran created YARN-3337:
------------------------------------

             Summary: Provide YARN chaos monkey
                 Key: YARN-3337
                 URL: https://issues.apache.org/jira/browse/YARN-3337
             Project: Hadoop YARN
          Issue Type: New Feature
          Components: test
    Affects Versions: 2.7.0
            Reporter: Steve Loughran


To test failure resilience today you either need custom scripts or implement Chaos Monkey-like logic in your application (SLIDER-202). 

Killing AMs and containers on a schedule & probability is the core activity here, one that could be handled by a CLI App/client lib that does this. 

# entry point to have a startup delay before acting
# frequency of chaos wakeup/polling
# probability to AM failure generation (0-100)
# probability of non-AM container kill
# future: other operations




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)