You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Denis Mekhanikov <dm...@gmail.com> on 2020/08/05 11:04:23 UTC

Distinguishing node-local metrics from cluster-wide

Hi Igniters!

My team and I are building a monitoring system on top of the new metrics
framework described in the following IEP:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
So far it's going well, but we'd like to improve the way metrics are
exported from Ignite.

There are different kinds of metrics that you can access through this
framework. Some of them are local for a node, like used heap, or CPU load.
It makes sense to send them independently from every node to the
centralized storage. Let's assume that we attach nodeID to metric names, so
that we can distinguish between metrics coming from different nodes.
It makes sense to work with local metrics using some kind of patterns on
metric names. For example, if I want to draw a chart for CPU load on every
node, I can use a pattern similar to the following one: sys.CpuLoad.*

There are also the kind of metrics that have the same value, no matter
which node the metric is taken from. For example, cache size, progress of
rebalance or topology version are global things that don't depend on the
node. If I take any of the metrics matching the pattern pme.Duration.*, I
will get what I need.

I wonder, what is the recommended approach to global metrics? I know that
there are tools like Prometheus and Graphite that allow similar
manipulations with metric names. Is it supposed that global and local
metrics are differentiated on the side of monitoring tools using functions
like any(pme.Duration.*) ? It seems that Graphite is lacking one, for
example.
Maybe it makes sense to introduce a property for metrics that will let the
exporters distinguish between them and not parameterize the names with node
ID?

What do you think?

Denis