You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@activemq.apache.org by Domenico Francesco Bruscino <br...@gmail.com> on 2020/04/28 09:52:10 UTC

[DISCUSS] Artemis health check tool (ARTEMIS-2739)

I'm implementing a tool to determine whether the broker is in a healthy
state. There is a series of health checks that can be performed, starting
with the most basic and very rarely producing false positives, to
increasingly more comprehensive, intrusive, and opinionated that have a
higher probability of false positives.

In the following list there are some health checks grouped by target:
- node
  - up - check if a client can connect to the the node
  - disk - check if the disk hits the `max-disk-usage` limit
  - memory - check if the memory available to the JVM
  - backup - check if the backup node is announced
  - queues - check if all queues with a positive rate have a consumer
- queue
  - up - check if the queue exists
  - browser - check if the queue is browsable
  - consumer - check if a consumer can connect to the queue and/or receive
messages
  - producer - check if a producer can connect to the queue and/or send
messages

I would start with some of the previous checks, exposing them through the
MBeans interfaces and/or the Command Line utility.

What are your thoughts?

Domenico

Re: [DISCUSS] Artemis health check tool (ARTEMIS-2739)

Posted by Gary Tully <ga...@gmail.com>.

I imagine the equivalent of the oracle db query : "select * from
DUAL", ie: something that exercise the server.

A combination of queue produce and consume, on some existing queue or
on a temp queue for that purpose.
I guess an existing queue may be better b/c on production systems
queue creation may be locked down.

This covers any potential unexpected blocking, the caveat though, is
that blocking can be a reasonable response for a queuing system that
has reached some limits.

A system that cannot produce may be healthy if it can browse.

To that end, maybe we need to have a pre configured queue that has one
message on it.
We verify we can browse it, then *if* after some small timeout we can
produce to it, we consume it. Essentially replacing the single entry
on the queue.

Periodic monitoring would cycle the head of the queue, blocking and
browsing would indicate healthy but blocked since the message-in-time
of the head of the queue.

It is some sort of multi value return: for example,  -1 cannot browse,
0 all good (replaced the head), > 0 the time of the head of the queue
I guess it could be red, green, amber also, but that is more vague. It
could be turned into that!

what is good health is very context specific, but a framework like
this could be generally useful I think and provide an example of how
some more context specific health checks could be achieved.
maybe some food for thought.

/gary

On Tue, 28 Apr 2020 at 10:52, Domenico Francesco Bruscino
<br...@gmail.com> wrote:
>
> I'm implementing a tool to determine whether the broker is in a healthy
> state. There is a series of health checks that can be performed, starting
> with the most basic and very rarely producing false positives, to
> increasingly more comprehensive, intrusive, and opinionated that have a
> higher probability of false positives.
>
> In the following list there are some health checks grouped by target:
> - node
>   - up - check if a client can connect to the the node
>   - disk - check if the disk hits the `max-disk-usage` limit
>   - memory - check if the memory available to the JVM
>   - backup - check if the backup node is announced
>   - queues - check if all queues with a positive rate have a consumer
> - queue
>   - up - check if the queue exists
>   - browser - check if the queue is browsable
>   - consumer - check if a consumer can connect to the queue and/or receive
> messages
>   - producer - check if a producer can connect to the queue and/or send
> messages
>
> I would start with some of the previous checks, exposing them through the
> MBeans interfaces and/or the Command Line utility.
>
> What are your thoughts?
>
> Domenico

Re: [DISCUSS] Artemis health check tool (ARTEMIS-2739)

Posted by brusdev <br...@gmail.com>.

Hi Gary and Justin,

thanks for your suggestions. I created the PR #3118[1] to add the new
command `check` to the Command Line utility. This command exposes some
checks for nodes and queues using the management API for most of them.
The checks have been implemented to be modular. Each user can compose his
own health check, ie to produce and consume from a queue the command is
`artemis check queue --name TEST --produce 1 --consume 1`.

[1] https://github.com/apache/activemq-artemis/pull/3118

Domenico



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-f2368404.html

Re: [DISCUSS] Artemis health check tool (ARTEMIS-2739)

Posted by Justin Bertram <jb...@apache.org>.

My recommendation would be to keep things simple. There isn't going to be
one test that will satisfy all use-cases.

It's worth noting that we already have lots of different ways to inspect a
lot of different data points on the broker via the management API. You can
access data about message counts, consumer counts, connection counts, etc.
via HTTP (i.e. using Jolokia), JMX, & management messages via most of our
supported protocols.

We also expose the most critical data through metrics plugins. This data
can be consumed by tools like Prometheus and ultimately Grafana which have
sophisticated monitoring and alerting capabilities which I would not want
to try to reproduce in the broker itself. Perhaps there are some metrics
that we don't currently expose that would be useful here, but in general I
think most of the raw data is available for any user to determine the
health of their broker as it fits their use-case.

Justin

On Tue, Apr 28, 2020 at 4:59 AM Domenico Francesco Bruscino <
bruscinodf@gmail.com> wrote:

> I'm implementing a tool to determine whether the broker is in a healthy
> state. There is a series of health checks that can be performed, starting
> with the most basic and very rarely producing false positives, to
> increasingly more comprehensive, intrusive, and opinionated that have a
> higher probability of false positives.
>
> In the following list there are some health checks grouped by target:
> - node
>   - up - check if a client can connect to the the node
>   - disk - check if the disk hits the `max-disk-usage` limit
>   - memory - check if the memory available to the JVM
>   - backup - check if the backup node is announced
>   - queues - check if all queues with a positive rate have a consumer
> - queue
>   - up - check if the queue exists
>   - browser - check if the queue is browsable
>   - consumer - check if a consumer can connect to the queue and/or receive
> messages
>   - producer - check if a producer can connect to the queue and/or send
> messages
>
> I would start with some of the previous checks, exposing them through the
> MBeans interfaces and/or the Command Line utility.
>
> What are your thoughts?
>
> Domenico
>