You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by Steve Loughran <st...@hortonworks.com> on 2014/08/12 15:39:58 UTC

SLIDER-77: windowed failure modes

I've checked in the sliding window code of SLIDER-77,
https://issues.apache.org/jira/browse/SLIDER-77, it appears to work in
automated tests as well as some manual ones (killing processes and the AM)

docs are in svn:

content/docs/slider_specs/resource_specification.md

...you set yarn.container.failure.threshold in resources.json/global; the
window can be set in yarn.container.failure.window.days`,
`yarn.container.failure.window.hours` and
``yarn.container.failure.window.minutes`


Now: what do we consider makes good numbers here?

I actually think a large threshold number may be best, to be resilient to
hardware failures, with a fairly small window (1-2 hours). If you lose lots
of containers in an hour, either the service is down or the YARN cluster is
in big trouble. And when I mean lots, I mean more than 100% of the
containers.

We don't currently support per-component-type thresholds; we could extend
the logic to read in the value per role type, so that you could have a
lower threshold on HBase master failures than on region servers. As you
have many region servers, their failure counts would be higher, so it makes
sense. SLIDER-310 covers that.

For now: what do we think makes a good general failure threshold and window?

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.