You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by Gary Tully <ga...@gmail.com> on 2020/05/05 10:28:16 UTC
Re: [DISCUSS] Artemis health check tool (ARTEMIS-2739)

I imagine the equivalent of the oracle db query : "select * from
DUAL", ie: something that exercise the server.

A combination of queue produce and consume, on some existing queue or
on a temp queue for that purpose.
I guess an existing queue may be better b/c on production systems
queue creation may be locked down.

This covers any potential unexpected blocking, the caveat though, is
that blocking can be a reasonable response for a queuing system that
has reached some limits.

A system that cannot produce may be healthy if it can browse.

To that end, maybe we need to have a pre configured queue that has one
message on it.
We verify we can browse it, then *if* after some small timeout we can
produce to it, we consume it. Essentially replacing the single entry
on the queue.

Periodic monitoring would cycle the head of the queue, blocking and
browsing would indicate healthy but blocked since the message-in-time
of the head of the queue.

It is some sort of multi value return: for example,  -1 cannot browse,
0 all good (replaced the head), > 0 the time of the head of the queue
I guess it could be red, green, amber also, but that is more vague. It
could be turned into that!

what is good health is very context specific, but a framework like
this could be generally useful I think and provide an example of how
some more context specific health checks could be achieved.
maybe some food for thought.

/gary

On Tue, 28 Apr 2020 at 10:52, Domenico Francesco Bruscino
<br...@gmail.com> wrote:
>
> I'm implementing a tool to determine whether the broker is in a healthy
> state. There is a series of health checks that can be performed, starting
> with the most basic and very rarely producing false positives, to
> increasingly more comprehensive, intrusive, and opinionated that have a
> higher probability of false positives.
>
> In the following list there are some health checks grouped by target:
> - node
>   - up - check if a client can connect to the the node
>   - disk - check if the disk hits the `max-disk-usage` limit
>   - memory - check if the memory available to the JVM
>   - backup - check if the backup node is announced
>   - queues - check if all queues with a positive rate have a consumer
> - queue
>   - up - check if the queue exists
>   - browser - check if the queue is browsable
>   - consumer - check if a consumer can connect to the queue and/or receive
> messages
>   - producer - check if a producer can connect to the queue and/or send
> messages
>
> I would start with some of the previous checks, exposing them through the
> MBeans interfaces and/or the Command Line utility.
>
> What are your thoughts?
>
> Domenico