You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sling.apache.org by Bertrand Delacretaz <bd...@apache.org> on 2013/12/24 09:16:59 UTC

HealthCheckExecutor.execute(ServiceReference) ??

Hi,

About SLING-3278 - IMO execute(ServiceReference) at [1] is an
unnecessary leak of implementation details, it makes no sense at the
API level. It's HealthChecks that we are executing, not service
references.

IIUC the root cause is that the HealthCheck API doesn't provide
metadata that it should: tags, name, optional JMX MBean name, unique
ID maybe, all these things clearly belong to a HealthCheck, so the API
should provide access to them, even if they are defined by service
properties.

I suggest adding a HealthCheckMetadata interface to provide those
things (tags, name, optional things in a Map maybe), and add a
HealthCheck.getMetadata() method that returns this. That'll break
existing HealthCheck implementations, which we indicate by bumping up
the package version number, as this quite a new API it's better to fix
it now than later.

-Bertrand

[1] https://svn.apache.org/repos/asf/sling/trunk/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api/HealthCheckExecutor.java

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

Hi Bertrand,


2013/12/31 Bertrand Delacretaz <bd...@apache.org>

>
>
> I don't understand why the service object cannot be used as a key.
>

First of all you would have to use an identity map as you can't expect that
the HC implements equals/hashCode correctly. But more important you don't
know how the HC service is implemented, is it a singleton, a service
factory or (new with OSGi R6) a prototype? If it's not a singleton you end
up with different objects for the same hc, so caching based on this would
simply not work.

Carsten


>
> In my SLING-3278-bertrand.patch the HealthCheckExecutorImpl maintains
> a Map<HealthCheck, HealthCheckExecutionWrapper> and uses a
> ServiceTracker to make sure there's exactly one entry in this map per
> registered HealthCheck service. You can search for "wrappers" at [1]
> to see how that works.
>
> Then, when executing an HC, its value is used to retrieve the
> corresponding wrapper and all is well - the wrapper can then take care
> of Futures, cache management etc.
>
> What's the problem with that?
>
> -Bertrand
>
> [1]
> https://issues.apache.org/jira/secure/attachment/12619079/SLING-3278-bertrand.patch
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Carsten,

On Tue, Dec 24, 2013 at 11:11 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> ...It's not just about the metadata - it's about uniquely identifying a HC
> within the executor - the service object is not a good key, especially as
> we have no way to discard it from the cache...

I don't understand why the service object cannot be used as a key.

In my SLING-3278-bertrand.patch the HealthCheckExecutorImpl maintains
a Map<HealthCheck, HealthCheckExecutionWrapper> and uses a
ServiceTracker to make sure there's exactly one entry in this map per
registered HealthCheck service. You can search for "wrappers" at [1]
to see how that works.

Then, when executing an HC, its value is used to retrieve the
corresponding wrapper and all is well - the wrapper can then take care
of Futures, cache management etc.

What's the problem with that?

-Bertrand

[1] https://issues.apache.org/jira/secure/attachment/12619079/SLING-3278-bertrand.patch

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le 30 déc. 2013 10:30, "Carsten Ziegeler" <cz...@apache.org> a écrit :
>
> The more I think about it, the more I get the feeling that if we can't
away
> with execute(ServiceReference) in the executor api, this should be the
only
> method of that service...

That's what my SLING-3278-bertrand.patch does, attached to that issue. The
implementation is much simpler, the downside is that health checks are not
executed in parallel. Maybe returning a Future would do the trick.

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference)??

Posted by Georg Henzler <sl...@cq-eclipse-plugin.net>.

Hi Carsten,

> If - for whatever reason - I have an arbitrary set of HCs to execute 
> in
> parallel, this is not possible atm. So what about having just a 
> single
> method?
>
> execute(ServiceReference<HealthCheck>[] hcs)
>

This is an option, but I would at least add the convenience method

execute(String... tags)

that handles the lookup - I can't think of a use case for the arbitrary 
set of HCs at the moment, execution for tags or all checks is really all 
I would need and I'd love to see the most useful method in the API ;-)

>
> I also think we should move the executor service interface and the 
> result
> interface (regardless of how this looks) into a separate package.

Looks good.

>
> For the result, I think we should leave the Result class untouched as 
> this
> is the result as produces by the HC - which is not exactly the same 
> as the
> result produced by the executor.
> We can rename the result class from the Executor to
> HealthCheckExecutionResult - to make this clear and instead of having 
> the
> same methods as the Result class, simple add a method which returns 
> the
> Result object that was returned by the HealthCheck and just add 
> additional
> methods for the additional information. Instead of returning the meta 
> data
> (like tags, name etc.) we could also simply return the 
> ServiceReference -
> as we now have the service reference as input, we can return it as 
> well.

I would NOT return the ServiceReference... the descriptor is a lot 
nicer as it handles the reading of the HC service properties (sort of a 
data binding). Also from an implementors point of view, it is a lot 
nicer to be able to use a typed class than a service reference+accessing 
properties (having to find the prop names from somewhere, and then 
handling the complexity of reading an array making sure also getting the 
one-entry-array right).

Also we should not mix different data types:
1) The actual result being returned by the HC (unique per result)
2) HealthCheckDescriptor is unique per HC and gives easy, typed access 
to the properties
3) The timing information (unique per result) can go in 
HealthCheckExecutionResult (I don't really like to have to use 
getHealthCheckResult().getStatus(), I would rather extend an interface 
from org.apache.sling.hc.api, but then it gets all a bit more 
complicated than it needs to be - that's why I used one interface + 
UnsupportedOperationException)

@Betrand: In your wiki page you suggest mixing 2) and 3) in an untyped 
map - I think this is not a good idea (this is mixing information that 
is unique per HC and unique per execution).
@Carsten: If we stay with a separate execution result, we should 
definitely use get rid of getHealthCheckName() and getHealthCheckTags() 
and use getHealthCheckDescriptor() (or meta data if you prefer) to 
separate data type 2) from 3)

Regards
Georg

P.S. Happy & *healthy* new year! Last email from my side this year :)

>
> I've committed the changes except the last point to make it clearer 
> what I
> mean - we can easily revert if we disagree (rev 1554396)
>
> WDYT?
>
> Regards
> Carsten
>
>
> 2013/12/30 Georg Henzler <sl...@cq-eclipse-plugin.net>
>
>> Hi Betrand and Christian,
>>
>> now I'm afraid we might go a step backward... some things that you 
>> suggest
>> to be unnecessary are essentially important from my point of view -
>> supported by one year experience getting to this stage :).  Here are 
>> my
>> explanations:
>>
>> --- RE: executeXXX Methods
>>
>> We definitely need the executeForTags(String... tags) and 
>> executeAll()
>> methods because only this way we get parallel execution with 
>> intelligent
>> timeout handling and current results:
>>
>> - If all checks are quick (let's say the the longest check requires 
>> 100ms
>> so finish), executeXxx() will return after 100ms having all current 
>> results
>> (Bertrand, your patch was returning the results of the last call, if 
>> the
>> last call was 2h ago, all results would be two hours old!)
>> - If all checks except one are quick and let's say the one check 
>> takes 3s
>> timing out, we get all current results for the quick checks and one 
>> WARN
>> timeout result for one check (if we wanted we could change it to the 
>> last
>> cached result, but for my last project this behaviour worked better, 
>> also
>> see next bullet point)
>> - The global timeout is normally configured reasonably ensuring that 
>> when
>> things are ok, checks never time out
>> - Sometimes, if things go bad, a check can be stuck completely (I 
>> know
>> this shouldn't happen, but I have seen it happening). For this case 
>> a
>> special timeout can be configured marking checks as critical if they 
>> exceed
>> this timeout (10sec default)
>>
>> The current implementation is robust and correct in many ways:
>> - no matter how many calls to the health check executor are made 
>> (e.g. 100
>> per sec or only 1 every day), we get the latest results (respecting 
>> the
>> cache TTL) but without ever executing the same check twice (two 
>> request to
>> the web console started at the exact same time will only cause each 
>> check
>> to be called once and the two requests will return the exact same 
>> result)
>> - it uses never more threads than checks exist in the system
>> - no matter how bad one check is implemented or how "sick" the 
>> aspect the
>> check observes is, it will never in any way cause the executor to 
>> stop
>> working except that it returns red for the check (returning the 
>> results for
>> the other checks correctly)
>>
>> --- RE: Health Check MetaData
>>
>> It is important to distinguish two aspects:
>>
>> 1) defining the meta data
>> 2) allow access to the meta data
>>
>> to 1) I definitely think this should be done in a declarative way 
>> (not
>> programmatically by adding a method to the HealthCheck interface!). 
>> There
>> is just no need for touring complete code here. Using the service
>> properties as it's done in the moment is IMHO the best way.
>>
>> to 2) There should be definitely a way to access the meta data from 
>> the
>> result. My way was to use the class HealthCheckDescriptor and add a 
>> method
>> result.getHealthCheckDescriptor() to the result (funny enough that 
>> class
>> was called HealthCheckMetaData in my first version but I renamed it 
>> later
>> because I thought Descriptor is better). Having a flat result 
>> structure was
>> requested by Carsten, that's why we have the methods 
>> getHealthCheckName()
>> and getHealthCheckTags() now. The cleanest solution my opinion is to 
>> go
>> back to result.getHealthCheckDescriptor() - or we can call it
>> getHealthCheckMetaData() if you prefer. Also we can create an 
>> interface
>> HealthCheckMetaData() and keep that in line with the service 
>> properties.
>>
>> --- RE: "why can't we provide these values from the HealthCheck that 
>> was
>> used to create it?" (these values=timing information)
>>
>> Because
>> - every check would have to do it (code duplication, pushing 
>> complexity to
>> the implementor)
>> - it cannot be guaranteed, that every health check does it correctly
>>
>>
>> --- RE: UnsupportedOperationException in Result
>>
>> Using the UnsupportedOperationException is IMHO an elegant solution. 
>> The
>> exception is part of the JDK and for instance the Collection API is 
>> using
>> it as well. It is elegant because
>> - From a conceptual/object oriented view, it totally makes sense to 
>> have
>> getElapsedTime() or getFinishedAt() as properties of a result. 
>> However, the
>> HC itself cannot provide this information (see last section) and 
>> it's fine
>> to throw UnsupportedOperationException for these results then.
>> - If the methods would be taken away from the interface, a cast to
>> ExecutionResult would be needed (that is hidden at the moment 
>> anyway) to
>> show this information in the web console
>> - From an implementors point of view, it is quickly obvious that the
>> methods are only available when using the executor (the exception 
>> message
>> is pretty clear)
>> - Typical implementors would not even notice as they just construct 
>> the
>> result, but don't call methods on it (that is the job of the web 
>> console
>> and JMX)
>> - We should have getElapsedTime() and getFinishedAt() in the
>> HealthCheckResult interface because this information is useful for 
>> human
>> observers (and should be shown in web console / JMX)
>>
>>
>> -Georg
>>
>>
>> Am 30.12.2013 11:30, schrieb Carsten Ziegeler:
>>
>>  The more I think about it, the more I get the feeling that if we 
>> can't
>>> away
>>> with execute(ServiceReference) in the executor api, this should be 
>>> the
>>> only
>>> method of that service.
>>> The two others can easily be replaced by using the HCFilter and 
>>> then
>>> calling the method above for each service reference. It's just one 
>>> line
>>> more code, but this would completely remove the need to add meta
>>> information about the HC in the result and make the whole 
>>> implementation
>>> easier.
>>>
>>> WDYT?
>>>
>>> Carsten
>>>
>>>
>>> 2013/12/30 Carsten Ziegeler <cz...@apache.org>
>>>
>>>  2013/12/27 Bertrand Delacretaz <bd...@apache.org>
>>>>
>>>>  Hi Carsten and Georg,
>>>>>
>>>>> I was going to say that I still disagree, but being outnumbered 
>>>>> I'd
>>>>> let it go if it's just about a single method that I don't 
>>>>> like...and
>>>>> then I reviewed the org.apache.sling.hc.api package once again 
>>>>> and
>>>>> it's worse than I thought.
>>>>>
>>>>>  The problem is really, if we don't have a method with this 
>>>>> signature
>>>> how
>>>> can the jmx implementation make use of the executor while the 
>>>> executor is
>>>> still able to cache the result, do async processing etc. ? I don't 
>>>> see
>>>> any
>>>> other solution atm, apart from moving the jmx implementation into 
>>>> the
>>>> core.
>>>>
>>>>
>>>>
>>>>> You guys both say you don't want metadata in APIs...and the
>>>>> HealthCheckResult interface that you added has these two methods
>>>>> (which make sense BTW):
>>>>>
>>>>>   String getHealthCheckName();
>>>>>   List<String> getHealthCheckTags();
>>>>>
>>>>> So a HC is not brave enough to have this in its API, but a Result
>>>>> needs them? This clearly demonstrates that we need a
>>>>> HealthCheckMetadata interface that provides those values.
>>>>>
>>>>> I think these are different concerns, a service(!) does not need 
>>>>> an api
>>>>>
>>>> to expose the metadata because it's already in the service 
>>>> registration.
>>>> When we return a set of results, there need to be way of 
>>>> correlation
>>>> between a single result and an actual check, otherwise you get a 
>>>> bunch of
>>>> results without any additional information about the check.
>>>> The way out of this, would be to just have a single method
>>>> execute(ServiceReference) in the executor :) - but as soon as we 
>>>> allow
>>>> execution of more than a single HC we need this additional 
>>>> information.
>>>>
>>>>
>>>>  On top of that, the Result throws UnsupportedException on these
>>>>> methods...how ugly is that, why can't it provide these values 
>>>>> from the
>>>>> HealthCheck that was used to create it?
>>>>>
>>>>>  I haven't looked deeply at this part yet...
>>>>
>>>>
>>>>> To me this smells of implementation-driven APIs, we are used to 
>>>>> better
>>>>> design in Sling.
>>>>>
>>>>> As pointed out, I'm all ears for a better solution of the 
>>>>> executor
>>>>>
>>>> interface that allows us to get the above mentioned features. And 
>>>> as
>>>> pointed out below, if we simplify the executor service to a single
>>>> method,
>>>> we don't need to make any changes to Result or add any additional
>>>> information.
>>>>
>>>> Carsten
>>>>
>>>> What do others think?
>>>>
>>>>>
>>>>> -Bertrand
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>>>
>>>>> https://svn.apache.org/repos/asf/sling/trunk/bundles/
>>>>> extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Carsten Ziegeler
>>>> cziegeler@apache.org
>>>>
>>>>
>>

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

Hi Georg,

thanks for the explanation - I see your point of parallel execution, but :)
right now this is either executing all checks in parallel or based on tags.
If - for whatever reason - I have an arbitrary set of HCs to execute in
parallel, this is not possible atm. So what about having just a single
method?

execute(ServiceReference<HealthCheck>[] hcs)

The result from HealthCheckFilter.getTaggedHealthCheckServiceReferences()
can be fed directly into this method. And we solve the case where you have
a single service reference and arbitrary onces as well.

I also think we should move the executor service interface and the result
interface (regardless of how this looks) into a separate package.

For the result, I think we should leave the Result class untouched as this
is the result as produces by the HC - which is not exactly the same as the
result produced by the executor.
We can rename the result class from the Executor to
HealthCheckExecutionResult - to make this clear and instead of having the
same methods as the Result class, simple add a method which returns the
Result object that was returned by the HealthCheck and just add additional
methods for the additional information. Instead of returning the meta data
(like tags, name etc.) we could also simply return the ServiceReference -
as we now have the service reference as input, we can return it as well.

I've committed the changes except the last point to make it clearer what I
mean - we can easily revert if we disagree (rev 1554396)

WDYT?

Regards
Carsten





2013/12/30 Georg Henzler <sl...@cq-eclipse-plugin.net>

> Hi Betrand and Christian,
>
> now I'm afraid we might go a step backward... some things that you suggest
> to be unnecessary are essentially important from my point of view -
> supported by one year experience getting to this stage :).  Here are my
> explanations:
>
> --- RE: executeXXX Methods
>
> We definitely need the executeForTags(String... tags) and executeAll()
> methods because only this way we get parallel execution with intelligent
> timeout handling and current results:
>
> - If all checks are quick (let's say the the longest check requires 100ms
> so finish), executeXxx() will return after 100ms having all current results
> (Bertrand, your patch was returning the results of the last call, if the
> last call was 2h ago, all results would be two hours old!)
> - If all checks except one are quick and let's say the one check takes 3s
> timing out, we get all current results for the quick checks and one WARN
> timeout result for one check (if we wanted we could change it to the last
> cached result, but for my last project this behaviour worked better, also
> see next bullet point)
> - The global timeout is normally configured reasonably ensuring that when
> things are ok, checks never time out
> - Sometimes, if things go bad, a check can be stuck completely (I know
> this shouldn't happen, but I have seen it happening). For this case a
> special timeout can be configured marking checks as critical if they exceed
> this timeout (10sec default)
>
> The current implementation is robust and correct in many ways:
> - no matter how many calls to the health check executor are made (e.g. 100
> per sec or only 1 every day), we get the latest results (respecting the
> cache TTL) but without ever executing the same check twice (two request to
> the web console started at the exact same time will only cause each check
> to be called once and the two requests will return the exact same result)
> - it uses never more threads than checks exist in the system
> - no matter how bad one check is implemented or how "sick" the aspect the
> check observes is, it will never in any way cause the executor to stop
> working except that it returns red for the check (returning the results for
> the other checks correctly)
>
> --- RE: Health Check MetaData
>
> It is important to distinguish two aspects:
>
> 1) defining the meta data
> 2) allow access to the meta data
>
> to 1) I definitely think this should be done in a declarative way (not
> programmatically by adding a method to the HealthCheck interface!). There
> is just no need for touring complete code here. Using the service
> properties as it's done in the moment is IMHO the best way.
>
> to 2) There should be definitely a way to access the meta data from the
> result. My way was to use the class HealthCheckDescriptor and add a method
> result.getHealthCheckDescriptor() to the result (funny enough that class
> was called HealthCheckMetaData in my first version but I renamed it later
> because I thought Descriptor is better). Having a flat result structure was
> requested by Carsten, that's why we have the methods getHealthCheckName()
> and getHealthCheckTags() now. The cleanest solution my opinion is to go
> back to result.getHealthCheckDescriptor() - or we can call it
> getHealthCheckMetaData() if you prefer. Also we can create an interface
> HealthCheckMetaData() and keep that in line with the service properties.
>
> --- RE: "why can't we provide these values from the HealthCheck that was
> used to create it?" (these values=timing information)
>
> Because
> - every check would have to do it (code duplication, pushing complexity to
> the implementor)
> - it cannot be guaranteed, that every health check does it correctly
>
>
> --- RE: UnsupportedOperationException in Result
>
> Using the UnsupportedOperationException is IMHO an elegant solution. The
> exception is part of the JDK and for instance the Collection API is using
> it as well. It is elegant because
> - From a conceptual/object oriented view, it totally makes sense to have
> getElapsedTime() or getFinishedAt() as properties of a result. However, the
> HC itself cannot provide this information (see last section) and it's fine
> to throw UnsupportedOperationException for these results then.
> - If the methods would be taken away from the interface, a cast to
> ExecutionResult would be needed (that is hidden at the moment anyway) to
> show this information in the web console
> - From an implementors point of view, it is quickly obvious that the
> methods are only available when using the executor (the exception message
> is pretty clear)
> - Typical implementors would not even notice as they just construct the
> result, but don't call methods on it (that is the job of the web console
> and JMX)
> - We should have getElapsedTime() and getFinishedAt() in the
> HealthCheckResult interface because this information is useful for human
> observers (and should be shown in web console / JMX)
>
>
> -Georg
>
>
> Am 30.12.2013 11:30, schrieb Carsten Ziegeler:
>
>  The more I think about it, the more I get the feeling that if we can't
>> away
>> with execute(ServiceReference) in the executor api, this should be the
>> only
>> method of that service.
>> The two others can easily be replaced by using the HCFilter and then
>> calling the method above for each service reference. It's just one line
>> more code, but this would completely remove the need to add meta
>> information about the HC in the result and make the whole implementation
>> easier.
>>
>> WDYT?
>>
>> Carsten
>>
>>
>> 2013/12/30 Carsten Ziegeler <cz...@apache.org>
>>
>>  2013/12/27 Bertrand Delacretaz <bd...@apache.org>
>>>
>>>  Hi Carsten and Georg,
>>>>
>>>> I was going to say that I still disagree, but being outnumbered I'd
>>>> let it go if it's just about a single method that I don't like...and
>>>> then I reviewed the org.apache.sling.hc.api package once again and
>>>> it's worse than I thought.
>>>>
>>>>  The problem is really, if we don't have a method with this signature
>>> how
>>> can the jmx implementation make use of the executor while the executor is
>>> still able to cache the result, do async processing etc. ? I don't see
>>> any
>>> other solution atm, apart from moving the jmx implementation into the
>>> core.
>>>
>>>
>>>
>>>> You guys both say you don't want metadata in APIs...and the
>>>> HealthCheckResult interface that you added has these two methods
>>>> (which make sense BTW):
>>>>
>>>>   String getHealthCheckName();
>>>>   List<String> getHealthCheckTags();
>>>>
>>>> So a HC is not brave enough to have this in its API, but a Result
>>>> needs them? This clearly demonstrates that we need a
>>>> HealthCheckMetadata interface that provides those values.
>>>>
>>>> I think these are different concerns, a service(!) does not need an api
>>>>
>>> to expose the metadata because it's already in the service registration.
>>> When we return a set of results, there need to be way of correlation
>>> between a single result and an actual check, otherwise you get a bunch of
>>> results without any additional information about the check.
>>> The way out of this, would be to just have a single method
>>> execute(ServiceReference) in the executor :) - but as soon as we allow
>>> execution of more than a single HC we need this additional information.
>>>
>>>
>>>  On top of that, the Result throws UnsupportedException on these
>>>> methods...how ugly is that, why can't it provide these values from the
>>>> HealthCheck that was used to create it?
>>>>
>>>>  I haven't looked deeply at this part yet...
>>>
>>>
>>>> To me this smells of implementation-driven APIs, we are used to better
>>>> design in Sling.
>>>>
>>>> As pointed out, I'm all ears for a better solution of the executor
>>>>
>>> interface that allows us to get the above mentioned features. And as
>>> pointed out below, if we simplify the executor service to a single
>>> method,
>>> we don't need to make any changes to Result or add any additional
>>> information.
>>>
>>> Carsten
>>>
>>> What do others think?
>>>
>>>>
>>>> -Bertrand
>>>>
>>>> [1]
>>>>
>>>>
>>>>
>>>> https://svn.apache.org/repos/asf/sling/trunk/bundles/
>>>> extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api
>>>>
>>>>
>>>
>>>
>>> --
>>> Carsten Ziegeler
>>> cziegeler@apache.org
>>>
>>>
>


-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Jan 14, 2014 at 3:49 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> Adding a value to the enum would be a change in the api, which I would like
> to avoid.

ok, let's stay with boolean hasTimedOut() as is now.

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

Adding a value to the enum would be a change in the api, which I would like
to avoid.
And if you're just interested if execution is successful, returning a warn
looks like a good idea to me.

Carsten


2014/1/14 Bertrand Delacretaz <bd...@apache.org>

> On Tue, Jan 14, 2014 at 3:34 PM, Carsten Ziegeler <cz...@apache.org>
> wrote:
> > ...what about adding a boolean method, hasTimedOut() instead and always
> > return a correct date?...
>
> why not but then hasTimedOut() really means "no result available" so
> isn't the NO_RESULT status that we discussed earlier clearer and
> simpler?
>
> -Bertrand
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Jan 14, 2014 at 3:34 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> ...what about adding a boolean method, hasTimedOut() instead and always
> return a correct date?...

why not but then hasTimedOut() really means "no result available" so
isn't the NO_RESULT status that we discussed earlier clearer and
simpler?

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

Hmm, that doesn't look right to me - in that case I would need to compare
the date with an (arbitrary) fixed value.
I see the point for the null check
So what about adding a boolean method, hasTimedOut() instead and always
return a correct date?

Carsten


2014/1/14 Bertrand Delacretaz <bd...@apache.org>

> Hi,
>
> On Fri, Jan 10, 2014 at 3:57 PM, Carsten Ziegeler <cz...@apache.org>
> wrote:
> > ...the interesting case still is b) - I see the point that we return WARN
> > in that case - I would assume that getElapsedTimeInMs returns -1 and
> > getFinishedAt() returns null. So both could be used as an indicator for
> > this situation....
>
> Turns out that I had an unsent draft on this, sorry...I agree in
> principle, WARN is ok to indicate "no result available".
>
> I'd return the epoch (new Date(1L) ?) for getFinishedAt() instead of
> null which requires you to check for null everywhere you use that.
>
> Returning
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Fri, Jan 10, 2014 at 3:57 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> ...the interesting case still is b) - I see the point that we return WARN
> in that case - I would assume that getElapsedTimeInMs returns -1 and
> getFinishedAt() returns null. So both could be used as an indicator for
> this situation....

Turns out that I had an unsent draft on this, sorry...I agree in
principle, WARN is ok to indicate "no result available".

I'd return the epoch (new Date(1L) ?) for getFinishedAt() instead of
null which requires you to check for null everywhere you use that.

Returning

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

Answering my own question, I think its good if we always return the elapsed
time, but return null for getFinishedDate.

I'll do the change

Carsten


2014/1/10 Carsten Ziegeler <cz...@apache.org>

> I'Ve done the suggested changes:
> - moved jmx stuff into the core
> - changed the signature to execute(String... tags)
> - Added service reference to metadata
>
> The only thing I'm unclear with are the different result possibilites. I
> think there are these cases when execute is called:
> a) no result in the cache/cached result is obsolete, hc executes within
> the timeout -> hc is executed, real result is cached and returned
> b) no result in the cache/cached result is obsolete, hc executes not
> within the timeout -> hc is executed, fake result is returned
> c) valid result in the cache -> hc is not executed, cached result is
> returned
>
> I think there is no need for a client to find out whether it's case a) or
> c) - if interested client code can inspect the getFinishedAt() method.
> Now, the interesting case still is b) - I see the point that we return
> WARN in that case - I would assume that getElapsedTimeInMs returns -1 and
> getFinishedAt() returns null. So both could be used as an indicator for
> this situation.
>
> WDYT?
>
> Regards
> Carsten
>
>
> 2014/1/5 Georg Henzler <sl...@cq-eclipse-plugin.net>
>
>> Hi,
>>
>> I think this looks pretty good now! Thanks for adding HealthCheckMetadata
>> :)
>>
>> 1. "b) merge the jmx implementation into the core"
>> I'm happy with that and the signature execute(String ... tags) is my
>> preferred
>>
>> 2. Status NO_RESULT
>> I think it's probably cleaner to add a property hasTimedOut to the
>> ExecutionResult - this is also better for the case that we can return an
>> old result from cache. The main disadvantage of NO_RESULT is that it is not
>> immediately clear from a caller's perspective, "how bad" the status
>> NO_RESULT is. A naive caller might think this is sort of ok (GREEN) but I
>> think it really is WARN (as the timeout is configured in a way that should
>> be sufficient for all checks). Therefore explicitly returning WARN plus
>> hasTimedOut=true in the ExecutionResult is IMHO the clearer option from a
>> consumer's point of view.
>>
>> 3. Execution Timeouts defined by caller / "executing the HCs tagged with
>> external ... with longer timeout"
>> The globally defined timeout is sufficient, I agree with Carsten. The
>> "external HCs" example is valid and I would define the global timeout to
>> fit to the external checks, the point is that for the "internal HCs" you
>> don't need to define a separate timeout because
>>  * they are usually quick - and if they aren't, they should probably be
>> rewritten! ;-)
>>  * from a conceptual point of view, the maximum time you are prepared to
>> wait is still the same as for the long running ones (why would you set the
>> timeout for internal checks to 500ms? If the checks are quick it makes no
>> difference, if a check requires 800ms I believe you are still better off
>> getting a proper result after 800ms instead of a timeout one after 500ms).
>> Anyway, as suggested let's leave it for later if we find a good use case
>> that justifies the more complex API.
>>
>> 4. HealthCheckDescriptor is obsolete?
>> The current version in SVN uses HealthCheckDescriptor and
>> HealthCheckMetadata in utils - you probably created it that way to not have
>> the property serviceReference in HealthCheckMetadata. On the other hand
>> HealthCheckMetadata is taking a service reference as constructor argument
>> anyway - why not just merge the two classes in HealthCheckMetadata and get
>> rid of HealthCheckDescriptor?
>>
>> 5. Have you had a look at https://issues.apache.org/
>> jira/browse/SLING-3302 ? The tabular result could also include the
>> property hasTimedOut from 2.
>>
>> Best Regards
>> Georg
>>
>> Am 03.01.2014 17:29, schrieb Carsten Ziegeler:
>>
>>  Yes, I would prefer a separate state, parsing the log is too error prone.
>>>
>>> Carsten
>>>
>>>
>>> 2014/1/3 Bertrand Delacretaz <bd...@apache.org>
>>>
>>>  Hi,
>>>>
>>>> On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler <cz...@apache.org>
>>>> wrote:
>>>> > Bertrand wrote:
>>>> >> As is you can compute the age of a result, that should already help.
>>>> >>
>>>> > Hmm and what about the case, where there is no result yet and the HC
>>>> takes
>>>> > longer than the timeout?
>>>>
>>>> In my prototype I returned an HEALTH_CHECK_ERROR status for that, with
>>>> a Result that includes a log that explains the problem. Haven't
>>>> checked what the current implementation does.
>>>>
>>>> We could also return a specific NO_RESULT state in this case.
>>>>
>>>> -Bertrand
>>>>
>>>>
>>
>
>
> --
> Carsten Ziegeler
> cziegeler@apache.org
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

I'Ve done the suggested changes:
- moved jmx stuff into the core
- changed the signature to execute(String... tags)
- Added service reference to metadata

The only thing I'm unclear with are the different result possibilites. I
think there are these cases when execute is called:
a) no result in the cache/cached result is obsolete, hc executes within the
timeout -> hc is executed, real result is cached and returned
b) no result in the cache/cached result is obsolete, hc executes not within
the timeout -> hc is executed, fake result is returned
c) valid result in the cache -> hc is not executed, cached result is
returned

I think there is no need for a client to find out whether it's case a) or
c) - if interested client code can inspect the getFinishedAt() method.
Now, the interesting case still is b) - I see the point that we return WARN
in that case - I would assume that getElapsedTimeInMs returns -1 and
getFinishedAt() returns null. So both could be used as an indicator for
this situation.

WDYT?

Regards
Carsten


2014/1/5 Georg Henzler <sl...@cq-eclipse-plugin.net>

> Hi,
>
> I think this looks pretty good now! Thanks for adding HealthCheckMetadata
> :)
>
> 1. "b) merge the jmx implementation into the core"
> I'm happy with that and the signature execute(String ... tags) is my
> preferred
>
> 2. Status NO_RESULT
> I think it's probably cleaner to add a property hasTimedOut to the
> ExecutionResult - this is also better for the case that we can return an
> old result from cache. The main disadvantage of NO_RESULT is that it is not
> immediately clear from a caller's perspective, "how bad" the status
> NO_RESULT is. A naive caller might think this is sort of ok (GREEN) but I
> think it really is WARN (as the timeout is configured in a way that should
> be sufficient for all checks). Therefore explicitly returning WARN plus
> hasTimedOut=true in the ExecutionResult is IMHO the clearer option from a
> consumer's point of view.
>
> 3. Execution Timeouts defined by caller / "executing the HCs tagged with
> external ... with longer timeout"
> The globally defined timeout is sufficient, I agree with Carsten. The
> "external HCs" example is valid and I would define the global timeout to
> fit to the external checks, the point is that for the "internal HCs" you
> don't need to define a separate timeout because
>  * they are usually quick - and if they aren't, they should probably be
> rewritten! ;-)
>  * from a conceptual point of view, the maximum time you are prepared to
> wait is still the same as for the long running ones (why would you set the
> timeout for internal checks to 500ms? If the checks are quick it makes no
> difference, if a check requires 800ms I believe you are still better off
> getting a proper result after 800ms instead of a timeout one after 500ms).
> Anyway, as suggested let's leave it for later if we find a good use case
> that justifies the more complex API.
>
> 4. HealthCheckDescriptor is obsolete?
> The current version in SVN uses HealthCheckDescriptor and
> HealthCheckMetadata in utils - you probably created it that way to not have
> the property serviceReference in HealthCheckMetadata. On the other hand
> HealthCheckMetadata is taking a service reference as constructor argument
> anyway - why not just merge the two classes in HealthCheckMetadata and get
> rid of HealthCheckDescriptor?
>
> 5. Have you had a look at https://issues.apache.org/jira/browse/SLING-3302? The tabular result could also include the property hasTimedOut from 2.
>
> Best Regards
> Georg
>
> Am 03.01.2014 17:29, schrieb Carsten Ziegeler:
>
>  Yes, I would prefer a separate state, parsing the log is too error prone.
>>
>> Carsten
>>
>>
>> 2014/1/3 Bertrand Delacretaz <bd...@apache.org>
>>
>>  Hi,
>>>
>>> On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler <cz...@apache.org>
>>> wrote:
>>> > Bertrand wrote:
>>> >> As is you can compute the age of a result, that should already help.
>>> >>
>>> > Hmm and what about the case, where there is no result yet and the HC
>>> takes
>>> > longer than the timeout?
>>>
>>> In my prototype I returned an HEALTH_CHECK_ERROR status for that, with
>>> a Result that includes a log that explains the problem. Haven't
>>> checked what the current implementation does.
>>>
>>> We could also return a specific NO_RESULT state in this case.
>>>
>>> -Bertrand
>>>
>>>
>


-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference)??

Posted by Georg Henzler <sl...@cq-eclipse-plugin.net>.

Hi,

I think this looks pretty good now! Thanks for adding 
HealthCheckMetadata :)

1. "b) merge the jmx implementation into the core"
I'm happy with that and the signature execute(String ... tags) is my 
preferred

2. Status NO_RESULT
I think it's probably cleaner to add a property hasTimedOut to the 
ExecutionResult - this is also better for the case that we can return an 
old result from cache. The main disadvantage of NO_RESULT is that it is 
not immediately clear from a caller's perspective, "how bad" the status 
NO_RESULT is. A naive caller might think this is sort of ok (GREEN) but 
I think it really is WARN (as the timeout is configured in a way that 
should be sufficient for all checks). Therefore explicitly returning 
WARN plus hasTimedOut=true in the ExecutionResult is IMHO the clearer 
option from a consumer's point of view.

3. Execution Timeouts defined by caller / "executing the HCs tagged 
with external ... with longer timeout"
The globally defined timeout is sufficient, I agree with Carsten. The 
"external HCs" example is valid and I would define the global timeout to 
fit to the external checks, the point is that for the "internal HCs" you 
don't need to define a separate timeout because
  * they are usually quick - and if they aren't, they should probably be 
rewritten! ;-)
  * from a conceptual point of view, the maximum time you are prepared 
to wait is still the same as for the long running ones (why would you 
set the timeout for internal checks to 500ms? If the checks are quick it 
makes no difference, if a check requires 800ms I believe you are still 
better off getting a proper result after 800ms instead of a timeout one 
after 500ms).
Anyway, as suggested let's leave it for later if we find a good use 
case that justifies the more complex API.

4. HealthCheckDescriptor is obsolete?
The current version in SVN uses HealthCheckDescriptor and 
HealthCheckMetadata in utils - you probably created it that way to not 
have the property serviceReference in HealthCheckMetadata. On the other 
hand HealthCheckMetadata is taking a service reference as constructor 
argument anyway - why not just merge the two classes in 
HealthCheckMetadata and get rid of HealthCheckDescriptor?

5. Have you had a look at 
https://issues.apache.org/jira/browse/SLING-3302 ? The tabular result 
could also include the property hasTimedOut from 2.

Best Regards
Georg

Am 03.01.2014 17:29, schrieb Carsten Ziegeler:
> Yes, I would prefer a separate state, parsing the log is too error 
> prone.
>
> Carsten
>
>
> 2014/1/3 Bertrand Delacretaz <bd...@apache.org>
>
>> Hi,
>>
>> On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler 
>> <cz...@apache.org>
>> wrote:
>> > Bertrand wrote:
>> >> As is you can compute the age of a result, that should already 
>> help.
>> >>
>> > Hmm and what about the case, where there is no result yet and the 
>> HC
>> takes
>> > longer than the timeout?
>>
>> In my prototype I returned an HEALTH_CHECK_ERROR status for that, 
>> with
>> a Result that includes a log that explains the problem. Haven't
>> checked what the current implementation does.
>>
>> We could also return a specific NO_RESULT state in this case.
>>
>> -Bertrand
>>

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

Yes, I would prefer a separate state, parsing the log is too error prone.

Carsten


2014/1/3 Bertrand Delacretaz <bd...@apache.org>

> Hi,
>
> On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler <cz...@apache.org>
> wrote:
> > Bertrand wrote:
> >> As is you can compute the age of a result, that should already help.
> >>
> > Hmm and what about the case, where there is no result yet and the HC
> takes
> > longer than the timeout?
>
> In my prototype I returned an HEALTH_CHECK_ERROR status for that, with
> a Result that includes a log that explains the problem. Haven't
> checked what the current implementation does.
>
> We could also return a specific NO_RESULT state in this case.
>
> -Bertrand
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Fri, Jan 3, 2014 at 3:28 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> Bertrand wrote:
>> As is you can compute the age of a result, that should already help.
>>
> Hmm and what about the case, where there is no result yet and the HC takes
> longer than the timeout?

In my prototype I returned an HEALTH_CHECK_ERROR status for that, with
a Result that includes a log that explains the problem. Haven't
checked what the current implementation does.

We could also return a specific NO_RESULT state in this case.

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

2014/1/3 Bertrand Delacretaz <bd...@apache.org>

> Hi,
>
> On Fri, Jan 3, 2014 at 2:12 PM, Carsten Ziegeler <cz...@apache.org>
> wrote:
> > ...I'm not so sure about timeouts by the caller - I would assume that the
> > configured timeout for the executor is a sensible value. If you want to
> > specify something else for a specific client, this client would need to
> > know which HCs it's executing, how long it takes etc....
>
> Yes, my use case is executing the HCs tagged with "external" for
> example, I know those take longer than the performance counters for
> example, so I set a longer timeout for their execution to have a
> better chance of having fresh results.
>
> > ...We have discussed other things like forcing a HC to run (and not to
> use the
> > cached value), I'm not sure about those either....
>
> This is less clear for me as well.
>
> > ...We could think about adding an Options argument to the execute method
> where
> > we add getter/setters for these things....
>
> Yes, we can add a second execute method later that takes options.
>
> Ok, if we already know about some options we could add this already now.
Or we gather first some more expierence and add it later :)


> >
> > ...But I think we need a clear result if the timeout is reached to allow
> > clients to correctly display this and retry at a later point of time....
>
> As is you can compute the age of a result, that should already help.
>

Hmm and what about the case, where there is no result yet and the HC takes
longer than the timeout?

Carsten


> -Bertrand
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Fri, Jan 3, 2014 at 2:12 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> ...I'm not so sure about timeouts by the caller - I would assume that the
> configured timeout for the executor is a sensible value. If you want to
> specify something else for a specific client, this client would need to
> know which HCs it's executing, how long it takes etc....

Yes, my use case is executing the HCs tagged with "external" for
example, I know those take longer than the performance counters for
example, so I set a longer timeout for their execution to have a
better chance of having fresh results.

> ...We have discussed other things like forcing a HC to run (and not to use the
> cached value), I'm not sure about those either....

This is less clear for me as well.

> ...We could think about adding an Options argument to the execute method where
> we add getter/setters for these things....

Yes, we can add a second execute method later that takes options.

>
> ...But I think we need a clear result if the timeout is reached to allow
> clients to correctly display this and retry at a later point of time....

As is you can compute the age of a result, that should already help.

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

I haven't thought about this yet - but I tend to agree that a health check
(or someone else) needs a way to define how long a result should be cached.
This could just be another service registration property as I don't think
that this is a highly dynamic value which changes from execution to
execution of a single HC.

I'm not so sure about timeouts by the caller - I would assume that the
configured timeout for the executor is a sensible value. If you want to
specify something else for a specific client, this client would need to
know which HCs it's executing, how long it takes etc.
We have discussed other things like forcing a HC to run (and not to use the
cached value), I'm not sure about those either.
We could think about adding an Options argument to the execute method where
we add getter/setters for these things.

But I think we need a clear result if the timeout is reached to allow
clients to correctly display this and retry at a later point of time.

Carsten

2014/1/3 Bertrand Delacretaz <bd...@apache.org>

> Hi,
>
> On Wed, Jan 1, 2014 at 4:57 PM, Georg Henzler
> <sl...@cq-eclipse-plugin.net> wrote:
> > ...I don't think it is that useful to be able to specify a timeout in
> the web
> > console HC plugin for the following reasons...
>
> I'm still not convinced, IMO we are very likely to need
>
> a) Results that don't all have the same time to live. The Health Check
> that creates the result would set the TTL.
>
> b) Execution timeouts specified by the caller, to cope with various
> types of health checks and clients
>
> Both are easy to add later however, so if you guys don't want to add
> them now I'm fine.
>
> -Bertrand
>

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Wed, Jan 1, 2014 at 4:57 PM, Georg Henzler
<sl...@cq-eclipse-plugin.net> wrote:
> ...I don't think it is that useful to be able to specify a timeout in the web
> console HC plugin for the following reasons...

I'm still not convinced, IMO we are very likely to need

a) Results that don't all have the same time to live. The Health Check
that creates the result would set the TTL.

b) Execution timeouts specified by the caller, to cope with various
types of health checks and clients

Both are easy to add later however, so if you guys don't want to add
them now I'm fine.

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

Yes, only a single execute method with the tags.

Carsten


2014/1/3 Bertrand Delacretaz <bd...@apache.org>

> Hi Carsten,
>
> On Fri, Jan 3, 2014 at 9:22 AM, Carsten Ziegeler <cz...@apache.org>
> wrote:
> > ...I think it makes sense to move the
> > jmx stuff into the core, this keeps the executor service interface
> cleaner....
>
> So IIUC if we move jmx into core the HealthCheckExecutor would have just
>
>   Collection<HealthCheckExecutionResult> execute(String ... tags);
>
> ?
>
> If yes I'm +1 on the move, it doesn't add any dependencies to core.
>
> -Bertrand
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Carsten,

On Fri, Jan 3, 2014 at 9:22 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> ...I think it makes sense to move the
> jmx stuff into the core, this keeps the executor service interface cleaner....

So IIUC if we move jmx into core the HealthCheckExecutor would have just

  Collection<HealthCheckExecutionResult> execute(String ... tags);

?

If yes I'm +1 on the move, it doesn't add any dependencies to core.

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

If I look at the use cases, we have either executing HCs based on tags or
the JMX support which executes single HCs. For the jmx stuff we need to
pass the service reference into the executor - for all other use cases,
just providing the tags is sufficient.

So basically we have two options:
a) keep the service reference in the signature of the HC
b) merge the jmx implementation into the core

While I like to keep things separate, I think it makes sense to move the
jmx stuff into the core, this keeps the executor service interface cleaner.
And the jmx name constant to register a HC as a JMX bean is defined in the
core anyways.

WDYT?

Regards
Carsten


2014/1/1 Georg Henzler <sl...@cq-eclipse-plugin.net>

>
> I updated the wiki page at https://cwiki.apache.org/
> confluence/display/SLING/Health+Checks+Executor+Design with API Variant
> B: It is mostly in line with the current version in SVN with the two
> changes I already suggested. Carsten, WDYT?
>
>
>
>> That might work...OTOH I want to have the execution timeout specified
>> in the execute call, as being able to change it is very useful when
>> troubleshooting things from the webconsole.
>>
>
> I don't think it is that useful to be able to specify a timeout in the web
> console HC plugin for the following reasons:
> * Health check futures are never canceled, they always keep running until
> they are finished even if the executor returns a timeout => it is possible
> to set the log level of the HC to debug and check the log file even after
> the web console returned a timeout
> * If there is trouble, it is usually necessary to have a look at the
> proper log files anyway (for instance, a HC result is currently not capable
> of carrying an exception)
> * If really a longer timeout is needed, it is always possible to configure
> a higher timeout in the OSGi config of the HealthCheckExecutor (this is
> always possible in DEV/TEST envs)
>
> -Georg
>
>


-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference)??

Posted by Georg Henzler <sl...@cq-eclipse-plugin.net>.

I updated the wiki page at 
https://cwiki.apache.org/confluence/display/SLING/Health+Checks+Executor+Design 
with API Variant B: It is mostly in line with the current version in 
SVN with the two changes I already suggested. Carsten, WDYT?

>
> That might work...OTOH I want to have the execution timeout specified
> in the execute call, as being able to change it is very useful when
> troubleshooting things from the webconsole.

I don't think it is that useful to be able to specify a timeout in the 
web console HC plugin for the following reasons:
* Health check futures are never canceled, they always keep running 
until they are finished even if the executor returns a timeout => it is 
possible to set the log level of the HC to debug and check the log file 
even after the web console returned a timeout
* If there is trouble, it is usually necessary to have a look at the 
proper log files anyway (for instance, a HC result is currently not 
capable of carrying an exception)
* If really a longer timeout is needed, it is always possible to 
configure a higher timeout in the OSGi config of the HealthCheckExecutor 
(this is always possible in DEV/TEST envs)

-Georg

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Dec 31, 2013 at 11:18 AM, Georg Henzler
<sl...@cq-eclipse-plugin.net> wrote:
> My cwiki username is henzlerg...

ok, you should have write access now.

>...IMHO caching with short TTLs is
> totally fine for use case C) as well...

That might work...OTOH I want to have the execution timeout specified
in the execute call, as being able to change it is very useful when
troubleshooting things from the webconsole.

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference)??

Posted by Georg Henzler <sl...@cq-eclipse-plugin.net>.

Hi Bertrand,

thanks for creating the Wiki Page, I suppose this is a good way to get 
to a common understanding. My cwiki username is henzlerg.

Once you grant access to my user I'd like to create API Variant B and 
slightly change use case C) - IMHO caching caching with short TTLs is 
totally fine for use case C) as well. (I think nobody would configure 
more than 1 or 2sec as TTL as the main reason for the caching in the 
first place is attribute access from JMX, and that happens within milli 
seconds I suppose).

Regards
Georg

Am 31.12.2013 12:12, schrieb Bertrand Delacretaz:
> Hi,
>
> On Mon, Dec 30, 2013 at 3:58 PM, Georg Henzler
> <sl...@cq-eclipse-plugin.net> wrote:
>> ...now I'm afraid we might go a step backward... some things that 
>> you suggest
>> to be unnecessary are essentially important from my point of view -
>> supported by one year experience getting to this stage...
>
> Yes, it seems like we have slightly different ideas about what's 
> needed.
>
> I have tried to clarify the use cases that we envision, at
>
> 
> https://cwiki.apache.org/confluence/display/SLING/Health+Checks+Executor+Design
>
> At this point my goal is only to get the API right, as once we 
> release
> it it's harder to change it. Once we agree on that we can proceed 
> with
> the implementation.
>
> Georg, let us know your cwiki.apache.org username if you want write
> access to that.
>
> -Bertrand

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Mon, Dec 30, 2013 at 3:58 PM, Georg Henzler
<sl...@cq-eclipse-plugin.net> wrote:
> ...now I'm afraid we might go a step backward... some things that you suggest
> to be unnecessary are essentially important from my point of view -
> supported by one year experience getting to this stage...

Yes, it seems like we have slightly different ideas about what's needed.

I have tried to clarify the use cases that we envision, at

https://cwiki.apache.org/confluence/display/SLING/Health+Checks+Executor+Design

At this point my goal is only to get the API right, as once we release
it it's harder to change it. Once we agree on that we can proceed with
the implementation.

Georg, let us know your cwiki.apache.org username if you want write
access to that.

-Bertrand

Re: HealthCheckExecutor.execute(ServiceReference)??

Posted by Georg Henzler <sl...@cq-eclipse-plugin.net>.

Hi Betrand and Christian,

now I'm afraid we might go a step backward... some things that you 
suggest to be unnecessary are essentially important from my point of 
view - supported by one year experience getting to this stage :).  Here 
are my explanations:

--- RE: executeXXX Methods

We definitely need the executeForTags(String... tags) and executeAll() 
methods because only this way we get parallel execution with intelligent 
timeout handling and current results:

- If all checks are quick (let's say the the longest check requires 
100ms so finish), executeXxx() will return after 100ms having all 
current results (Bertrand, your patch was returning the results of the 
last call, if the last call was 2h ago, all results would be two hours 
old!)
- If all checks except one are quick and let's say the one check takes 
3s timing out, we get all current results for the quick checks and one 
WARN timeout result for one check (if we wanted we could change it to 
the last cached result, but for my last project this behaviour worked 
better, also see next bullet point)
- The global timeout is normally configured reasonably ensuring that 
when things are ok, checks never time out
- Sometimes, if things go bad, a check can be stuck completely (I know 
this shouldn't happen, but I have seen it happening). For this case a 
special timeout can be configured marking checks as critical if they 
exceed this timeout (10sec default)

The current implementation is robust and correct in many ways:
- no matter how many calls to the health check executor are made (e.g. 
100 per sec or only 1 every day), we get the latest results (respecting 
the cache TTL) but without ever executing the same check twice (two 
request to the web console started at the exact same time will only 
cause each check to be called once and the two requests will return the 
exact same result)
- it uses never more threads than checks exist in the system
- no matter how bad one check is implemented or how "sick" the aspect 
the check observes is, it will never in any way cause the executor to 
stop working except that it returns red for the check (returning the 
results for the other checks correctly)

--- RE: Health Check MetaData

It is important to distinguish two aspects:

1) defining the meta data
2) allow access to the meta data

to 1) I definitely think this should be done in a declarative way (not 
programmatically by adding a method to the HealthCheck interface!). 
There is just no need for touring complete code here. Using the service 
properties as it's done in the moment is IMHO the best way.

to 2) There should be definitely a way to access the meta data from the 
result. My way was to use the class HealthCheckDescriptor and add a 
method result.getHealthCheckDescriptor() to the result (funny enough 
that class was called HealthCheckMetaData in my first version but I 
renamed it later because I thought Descriptor is better). Having a flat 
result structure was requested by Carsten, that's why we have the 
methods getHealthCheckName() and getHealthCheckTags() now. The cleanest 
solution my opinion is to go back to result.getHealthCheckDescriptor() - 
or we can call it getHealthCheckMetaData() if you prefer. Also we can 
create an interface HealthCheckMetaData() and keep that in line with the 
service properties.

--- RE: "why can't we provide these values from the HealthCheck that 
was used to create it?" (these values=timing information)

Because
- every check would have to do it (code duplication, pushing complexity 
to the implementor)
- it cannot be guaranteed, that every health check does it correctly


--- RE: UnsupportedOperationException in Result

Using the UnsupportedOperationException is IMHO an elegant solution. 
The exception is part of the JDK and for instance the Collection API is 
using it as well. It is elegant because
- From a conceptual/object oriented view, it totally makes sense to 
have getElapsedTime() or getFinishedAt() as properties of a result. 
However, the HC itself cannot provide this information (see last 
section) and it's fine to throw UnsupportedOperationException for these 
results then.
- If the methods would be taken away from the interface, a cast to 
ExecutionResult would be needed (that is hidden at the moment anyway) to 
show this information in the web console
- From an implementors point of view, it is quickly obvious that the 
methods are only available when using the executor (the exception 
message is pretty clear)
- Typical implementors would not even notice as they just construct the 
result, but don't call methods on it (that is the job of the web console 
and JMX)
- We should have getElapsedTime() and getFinishedAt() in the 
HealthCheckResult interface because this information is useful for human 
observers (and should be shown in web console / JMX)


-Georg


Am 30.12.2013 11:30, schrieb Carsten Ziegeler:
> The more I think about it, the more I get the feeling that if we 
> can't away
> with execute(ServiceReference) in the executor api, this should be 
> the only
> method of that service.
> The two others can easily be replaced by using the HCFilter and then
> calling the method above for each service reference. It's just one 
> line
> more code, but this would completely remove the need to add meta
> information about the HC in the result and make the whole 
> implementation
> easier.
>
> WDYT?
>
> Carsten
>
>
> 2013/12/30 Carsten Ziegeler <cz...@apache.org>
>
>> 2013/12/27 Bertrand Delacretaz <bd...@apache.org>
>>
>>> Hi Carsten and Georg,
>>>
>>> I was going to say that I still disagree, but being outnumbered I'd
>>> let it go if it's just about a single method that I don't 
>>> like...and
>>> then I reviewed the org.apache.sling.hc.api package once again and
>>> it's worse than I thought.
>>>
>> The problem is really, if we don't have a method with this signature 
>> how
>> can the jmx implementation make use of the executor while the 
>> executor is
>> still able to cache the result, do async processing etc. ? I don't 
>> see any
>> other solution atm, apart from moving the jmx implementation into 
>> the core.
>>
>>
>>>
>>> You guys both say you don't want metadata in APIs...and the
>>> HealthCheckResult interface that you added has these two methods
>>> (which make sense BTW):
>>>
>>>   String getHealthCheckName();
>>>   List<String> getHealthCheckTags();
>>>
>>> So a HC is not brave enough to have this in its API, but a Result
>>> needs them? This clearly demonstrates that we need a
>>> HealthCheckMetadata interface that provides those values.
>>>
>>> I think these are different concerns, a service(!) does not need an 
>>> api
>> to expose the metadata because it's already in the service 
>> registration.
>> When we return a set of results, there need to be way of correlation
>> between a single result and an actual check, otherwise you get a 
>> bunch of
>> results without any additional information about the check.
>> The way out of this, would be to just have a single method
>> execute(ServiceReference) in the executor :) - but as soon as we 
>> allow
>> execution of more than a single HC we need this additional 
>> information.
>>
>>
>>> On top of that, the Result throws UnsupportedException on these
>>> methods...how ugly is that, why can't it provide these values from 
>>> the
>>> HealthCheck that was used to create it?
>>>
>> I haven't looked deeply at this part yet...
>>
>>>
>>> To me this smells of implementation-driven APIs, we are used to 
>>> better
>>> design in Sling.
>>>
>>> As pointed out, I'm all ears for a better solution of the executor
>> interface that allows us to get the above mentioned features. And as
>> pointed out below, if we simplify the executor service to a single 
>> method,
>> we don't need to make any changes to Result or add any additional
>> information.
>>
>> Carsten
>>
>> What do others think?
>>>
>>> -Bertrand
>>>
>>> [1]
>>>
>>>
>>> 
>>> https://svn.apache.org/repos/asf/sling/trunk/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api
>>>
>>
>>
>>
>> --
>> Carsten Ziegeler
>> cziegeler@apache.org
>>

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

The more I think about it, the more I get the feeling that if we can't away
with execute(ServiceReference) in the executor api, this should be the only
method of that service.
The two others can easily be replaced by using the HCFilter and then
calling the method above for each service reference. It's just one line
more code, but this would completely remove the need to add meta
information about the HC in the result and make the whole implementation
easier.

WDYT?

Carsten


2013/12/30 Carsten Ziegeler <cz...@apache.org>

> 2013/12/27 Bertrand Delacretaz <bd...@apache.org>
>
>> Hi Carsten and Georg,
>>
>> I was going to say that I still disagree, but being outnumbered I'd
>> let it go if it's just about a single method that I don't like...and
>> then I reviewed the org.apache.sling.hc.api package once again and
>> it's worse than I thought.
>>
> The problem is really, if we don't have a method with this signature how
> can the jmx implementation make use of the executor while the executor is
> still able to cache the result, do async processing etc. ? I don't see any
> other solution atm, apart from moving the jmx implementation into the core.
>
>
>>
>> You guys both say you don't want metadata in APIs...and the
>> HealthCheckResult interface that you added has these two methods
>> (which make sense BTW):
>>
>>   String getHealthCheckName();
>>   List<String> getHealthCheckTags();
>>
>> So a HC is not brave enough to have this in its API, but a Result
>> needs them? This clearly demonstrates that we need a
>> HealthCheckMetadata interface that provides those values.
>>
>> I think these are different concerns, a service(!) does not need an api
> to expose the metadata because it's already in the service registration.
> When we return a set of results, there need to be way of correlation
> between a single result and an actual check, otherwise you get a bunch of
> results without any additional information about the check.
> The way out of this, would be to just have a single method
> execute(ServiceReference) in the executor :) - but as soon as we allow
> execution of more than a single HC we need this additional information.
>
>
>> On top of that, the Result throws UnsupportedException on these
>> methods...how ugly is that, why can't it provide these values from the
>> HealthCheck that was used to create it?
>>
> I haven't looked deeply at this part yet...
>
>>
>> To me this smells of implementation-driven APIs, we are used to better
>> design in Sling.
>>
>> As pointed out, I'm all ears for a better solution of the executor
> interface that allows us to get the above mentioned features. And as
> pointed out below, if we simplify the executor service to a single method,
> we don't need to make any changes to Result or add any additional
> information.
>
> Carsten
>
> What do others think?
>>
>> -Bertrand
>>
>> [1]
>>
>>
>> https://svn.apache.org/repos/asf/sling/trunk/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api
>>
>
>
>
> --
> Carsten Ziegeler
> cziegeler@apache.org
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

2013/12/27 Bertrand Delacretaz <bd...@apache.org>

> Hi Carsten and Georg,
>
> I was going to say that I still disagree, but being outnumbered I'd
> let it go if it's just about a single method that I don't like...and
> then I reviewed the org.apache.sling.hc.api package once again and
> it's worse than I thought.
>
The problem is really, if we don't have a method with this signature how
can the jmx implementation make use of the executor while the executor is
still able to cache the result, do async processing etc. ? I don't see any
other solution atm, apart from moving the jmx implementation into the core.


>
> You guys both say you don't want metadata in APIs...and the
> HealthCheckResult interface that you added has these two methods
> (which make sense BTW):
>
>   String getHealthCheckName();
>   List<String> getHealthCheckTags();
>
> So a HC is not brave enough to have this in its API, but a Result
> needs them? This clearly demonstrates that we need a
> HealthCheckMetadata interface that provides those values.
>
> I think these are different concerns, a service(!) does not need an api to
expose the metadata because it's already in the service registration.
When we return a set of results, there need to be way of correlation
between a single result and an actual check, otherwise you get a bunch of
results without any additional information about the check.
The way out of this, would be to just have a single method
execute(ServiceReference) in the executor :) - but as soon as we allow
execution of more than a single HC we need this additional information.


> On top of that, the Result throws UnsupportedException on these
> methods...how ugly is that, why can't it provide these values from the
> HealthCheck that was used to create it?
>
I haven't looked deeply at this part yet...

>
> To me this smells of implementation-driven APIs, we are used to better
> design in Sling.
>
> As pointed out, I'm all ears for a better solution of the executor
interface that allows us to get the above mentioned features. And as
pointed out below, if we simplify the executor service to a single method,
we don't need to make any changes to Result or add any additional
information.

Carsten

What do others think?
>
> -Bertrand
>
> [1]
>
>
> https://svn.apache.org/repos/asf/sling/trunk/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api
>



-- 
Carsten Ziegeler
cziegeler@apache.org

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Carsten and Georg,

I was going to say that I still disagree, but being outnumbered I'd
let it go if it's just about a single method that I don't like...and
then I reviewed the org.apache.sling.hc.api package once again and
it's worse than I thought.

You guys both say you don't want metadata in APIs...and the
HealthCheckResult interface that you added has these two methods
(which make sense BTW):

  String getHealthCheckName();
  List<String> getHealthCheckTags();

So a HC is not brave enough to have this in its API, but a Result
needs them? This clearly demonstrates that we need a
HealthCheckMetadata interface that provides those values.

On top of that, the Result throws UnsupportedException on these
methods...how ugly is that, why can't it provide these values from the
HealthCheck that was used to create it?

To me this smells of implementation-driven APIs, we are used to better
design in Sling.

What do others think?

-Bertrand

[1]

https://svn.apache.org/repos/asf/sling/trunk/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api

Re: HealthCheckExecutor.execute(ServiceReference)??

Posted by Georg Henzler <sl...@cq-eclipse-plugin.net>.

I agree, I would definitely not force implementors to provide the 
method HealthCheck.getMetadata() - I believe it's rarely a good idea to 
provide meta data programmatically (if we really wanted to have a unique 
ID, we could add it as service properties like the name or the tags).

Additionally, forcing an implementor to define an id for a health check 
would feel awkward to 99% of the people, as they only provide them but 
never use it because the hypothetical signature 
HealthCheckExecutor.execute(hcId) would normally only be used by JMX.

-Georg

Am 24.12.2013 12:11, schrieb Carsten Ziegeler:
> It's not just about the metadata - it's about uniquely identifying a 
> HC
> within the executor - the service object is not a good key, 
> especially as
> we have no way to discard it from the cache. The service reference or 
> the
> service id contained within the reference is a good key. And uniquely
> identifying (for long running hcs) and caching the hc result is for 
> me the
> number one reason for the executor service.
>
> We should not change the HC interface as this is an incompatible 
> change and
> would break all existing implementations - as we've discussed during
> defining the first version of the API, there is no real need to have 
> the
> name, tags etc. in the HC interface. Services executing an HC get the
> service anyway from the service registry together with these 
> registration
> properties and it makes the implementation unnecessarily complicated.
>
> So we need something to uniquely identify a health check and as all
> metadata is optional we don't have anything to be used. Now, as I 
> wrote in
> the issue, I'm fine with having an execute(String name) method which 
> tries
> to execute a HC with the given name. This would require that a HC 
> which can
> be executed by this service either has a tag or a name.
> On the other hand, if we go with ServiceReference we can execute all 
> HCs
> through this service. And as Georg has pointed out, there are not 
> many
> clients of this service anyway: its the jmx implementation and the 
> web
> console.
>
>
> Carsten
>
>
> 2013/12/24 Bertrand Delacretaz <bd...@apache.org>
>
>> Hi,
>>
>> About SLING-3278 - IMO execute(ServiceReference) at [1] is an
>> unnecessary leak of implementation details, it makes no sense at the
>> API level. It's HealthChecks that we are executing, not service
>> references.
>>
>> IIUC the root cause is that the HealthCheck API doesn't provide
>> metadata that it should: tags, name, optional JMX MBean name, unique
>> ID maybe, all these things clearly belong to a HealthCheck, so the 
>> API
>> should provide access to them, even if they are defined by service
>> properties.
>>
>> I suggest adding a HealthCheckMetadata interface to provide those
>> things (tags, name, optional things in a Map maybe), and add a
>> HealthCheck.getMetadata() method that returns this. That'll break
>> existing HealthCheck implementations, which we indicate by bumping 
>> up
>> the package version number, as this quite a new API it's better to 
>> fix
>> it now than later.
>>
>> -Bertrand
>>
>> [1]
>> 
>> https://svn.apache.org/repos/asf/sling/trunk/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api/HealthCheckExecutor.java
>>

Re: HealthCheckExecutor.execute(ServiceReference) ??

Posted by Carsten Ziegeler <cz...@apache.org>.

It's not just about the metadata - it's about uniquely identifying a HC
within the executor - the service object is not a good key, especially as
we have no way to discard it from the cache. The service reference or the
service id contained within the reference is a good key. And uniquely
identifying (for long running hcs) and caching the hc result is for me the
number one reason for the executor service.

We should not change the HC interface as this is an incompatible change and
would break all existing implementations - as we've discussed during
defining the first version of the API, there is no real need to have the
name, tags etc. in the HC interface. Services executing an HC get the
service anyway from the service registry together with these registration
properties and it makes the implementation unnecessarily complicated.

So we need something to uniquely identify a health check and as all
metadata is optional we don't have anything to be used. Now, as I wrote in
the issue, I'm fine with having an execute(String name) method which tries
to execute a HC with the given name. This would require that a HC which can
be executed by this service either has a tag or a name.
On the other hand, if we go with ServiceReference we can execute all HCs
through this service. And as Georg has pointed out, there are not many
clients of this service anyway: its the jmx implementation and the web
console.


Carsten


2013/12/24 Bertrand Delacretaz <bd...@apache.org>

> Hi,
>
> About SLING-3278 - IMO execute(ServiceReference) at [1] is an
> unnecessary leak of implementation details, it makes no sense at the
> API level. It's HealthChecks that we are executing, not service
> references.
>
> IIUC the root cause is that the HealthCheck API doesn't provide
> metadata that it should: tags, name, optional JMX MBean name, unique
> ID maybe, all these things clearly belong to a HealthCheck, so the API
> should provide access to them, even if they are defined by service
> properties.
>
> I suggest adding a HealthCheckMetadata interface to provide those
> things (tags, name, optional things in a Map maybe), and add a
> HealthCheck.getMetadata() method that returns this. That'll break
> existing HealthCheck implementations, which we indicate by bumping up
> the package version number, as this quite a new API it's better to fix
> it now than later.
>
> -Bertrand
>
> [1]
> https://svn.apache.org/repos/asf/sling/trunk/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api/HealthCheckExecutor.java
>



-- 
Carsten Ziegeler
cziegeler@apache.org