You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Prasanth Jayachandran (JIRA)" <ji...@apache.org> on 2017/09/05 22:21:00 UTC
[jira] [Created] (SLIDER-1246) Application health should not be
affected by faulty nodes
Prasanth Jayachandran created SLIDER-1246:
---------------------------------------------
Summary: Application health should not be affected by faulty nodes
Key: SLIDER-1246
URL: https://issues.apache.org/jira/browse/SLIDER-1246
Project: Slider
Issue Type: Bug
Affects Versions: Slider 1.0.0
Reporter: Prasanth Jayachandran
In case of a faulty node, multiple container failures will be deemed as an application failure.
Observed this in HIVE-16927, where container failures in certain nodes brings down entire application. Slider has to provide a way to not mark application as unhealthy if certain threshold of containers are running. Tuning failure threshold is not optimal as setting the correct default on large cluster is not trivial. Beyond certain failures, slider should mark the node as unhealthy and report that back to client/AM. Application could continue to run as long as container request is satisfied partially (example: 80% containers are running).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)