You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Bertrand Delacretaz (JIRA)" <ji...@apache.org> on 2013/12/11 10:49:09 UTC

[jira] [Commented] (SLING-3278) Provide a HealthCheckExecutorService

    [ https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845234#comment-13845234 ] 

Bertrand Delacretaz commented on SLING-3278:
--------------------------------------------

I think the tentative spec at [1] covers async execution and caching, slightly reworked here to take Georg's suggestions into account:

# Executing a health check via the HealthCheckExecutor is guaranteed to return a Result in a most T msec. T is configurable and can be overridden in the execution call. 
# If the actual health check execution takes longer than T, the executor returns the last result that was previously computed and cached, or an empty result with state=NODATA (new state) if we don't have that yet.
# The executor service prevents concurrent execution of a given health check 
# Execution of a health check times out after U msec, configurable, and returns a Result that indicates the timeout (with a new state? TBD)

With this you don't need an async property on a health check service, the decision to return its last cached result is based on its actual execution time. The behavior switch based on execution time > T can be implemented using Future.get(timeout) and catching the TimeoutException to return the cached result.

I also suggest adding timing info to the Result: creation time, optional time to live and execution duration, to manage cache expiration and provide freshness info and execution statistics.

[1] http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html

> Provide a HealthCheckExecutorService
> ------------------------------------
>
>                 Key: SLING-3278
>                 URL: https://issues.apache.org/jira/browse/SLING-3278
>             Project: Sling
>          Issue Type: New Feature
>          Components: Health Check
>            Reporter: Georg Henzler
>
> Goals:
> * Be able to get an overall (aggregated) result as quickly as possible (ideally <2sec)
> * Whenever possible, return most current results (e.g. for a memory check)
> * Provide a declarative way for async checks (async checks should be the exception though) 
> Approach
> * Run checks in parallel
> * Make sure long running (or even stuck) checks are timed out
> * If a health check must run asynchronously (because its execution time cannot be optimized), it should be enough to just specify a service property (e.g. "hc.async").
> See also
> http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402
> http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)