You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Dave Marion <dm...@gmail.com> on 2021/10/04 11:49:33 UTC

Re: Metrics Replacement

Thanks for the information Ed. I updated my test[1] to use a different type
of registry and the output seems closer to what Hadoop is putting out.
Here's the Hadoop output again:

1633347999547 ctx.record: Context=ctx, ProcessName=testProcess, counter=1,
gauge=2, QuantileNumI/O=0, Quantile50thPercentileQuantile=0,
Quantile75thPercentileQuantile=0, Quantile90thPercentileQuantile=0,
Quantile95thPercentileQuantile=0, Quantile99thPercentileQuantile=0,
StatNumI/O=10, StatAvgStat=10.0, StatStdevStat=31.622776601683793,
StatIMinStat=3.4028234663852886E38, StatIMaxStat=1.401298464324817E-45,
StatMinStat=3.4028234663852886E38, StatMaxStat=1.401298464324817E-45,
StatINumI/O=10

Here's the output from the new test (which prints to stdout):

gauge
             value = 2.0
quantile
             count = 1
               min = 32
               max = 32
              mean = 32.00
            stddev = 0.00
            median = 32.00
              75% <= 32.00
              95% <= 32.00
              98% <= 32.00
              99% <= 32.00
            99.9% <= 32.00
counter
             count = 1
         mean rate = 0.10 events/second
     1-minute rate = 0.18 events/second
     5-minute rate = 0.20 events/second
    15-minute rate = 0.20 events/second
stat
             count = 1
         mean rate = 0.10 calls/second
     1-minute rate = 0.20 calls/second
     5-minute rate = 0.20 calls/second
    15-minute rate = 0.20 calls/second
               min = 0.01 seconds
               max = 0.01 seconds
              mean = 0.01 seconds
            stddev = 0.00 seconds
            median = 0.01 seconds
              75% <= 0.01 seconds
              95% <= 0.01 seconds
              98% <= 0.01 seconds
              99% <= 0.01 seconds
            99.9% <= 0.01 seconds

[1] https://gist.github.com/dlmarion/67e0ed8df320633d5af23ae00d965183

On Mon, Sep 27, 2021 at 6:24 PM dev1 <de...@etcoleman.com> wrote:

> The reporting of the rate vs the absolute count is likely because the
> logging registry is currently implemented using a StepRegistry (
> https://javadoc.io/doc/io.micrometer/micrometer-core/latest/io/micrometer/core/instrument/step/StepMeterRegistry.html
> )
>
> "Registry that step-normalizes counts and sums to a rate/second over the
> publishing interval"
>
> The counter will, under the covers, just have a counter - the registry is
> going to report the measured value according to the target metrics system.
>
>   1. Based on the Micrometer output, it appears that even if we can get
> the names to match (or document appropriately), users may still have to
> change their tooling based on the values that are being reported.
>
> This unfortunately seems likely to happen - but we should be able to
> explain what is being reported (or even better refer to external docs) -
> the creation of a micrometer instrumentation meter allows for a description
> so we should be able to either automate the description gathering or
> provide a self-describing set of metrics.  We would need to provide a
> manual mapping of old / new names.
>
> Some systems (like Prometheus) will create descriptive statistics from the
> raw measurements.  If a metric has valid reason to report useful summary
> statistics, then another meter may be a better fit (either a micrometer
> Timer or DistributionSummary) There is a memory cost for accumulating
> summary statistics so it may not always be appropriate for every metric.
>
>   2. It's possible that we could take a different approach, where we
> continue to use Hadoop Metrics2 internally and attempt to write a
> Micrometer sink for the Metrics2 framework for 2.x and move to Micrometer
> for the next major release. Based on the Hadoop JIRA, it does not appear
> that they have plans to move away from this framework.
>
> In my opinion, this would not be worth the effort.
>
> Ed Coleman
>
> ________________________________
> From: Dave Marion <dm...@gmail.com>
> Sent: Monday, September 27, 2021 4:52 PM
> To: dev@accumulo.apache.org <de...@accumulo.apache.org>
> Subject: Re: Metrics Replacement
>
> I created a test[1] to see the differences in the output. In this test I
> create equivalent metric objects and output them via their respective
> logging sink.
>
> For Hadoop Metrics, it created:
>
> 1632775059897 ctx.record: Context=ctx, ProcessName=testProcess, counter=1,
> gauge=2, QuantileNumI/O=0, Quantile50thPercentileLatency=0,
> Quantile75thPercentileLatency=0, Quantile90thPercentileLatency=0,
> Quantile95thPercentileLatency=0, Quantile99thPercentileLatency=0,
> StatNumI/O=10, StatAvgLatency=10.0, StatStdevLatency=31.622776601683793,
> StatIMinLatency=3.4028234663852886E38,
> StatIMaxLatency=1.401298464324817E-45,
> StatMinLatency=3.4028234663852886E38, StatMaxLatency=1.401298464324817E-45,
> StatINumI/O=10
>
> For Micrometer, it created:
>
> [logging-metrics-publisher] INFO
>  io.micrometer.core.instrument.logging.LoggingMeterRegistry [] - counter{}
> throughput=0.2/s
> [logging-metrics-publisher] INFO
>  io.micrometer.core.instrument.logging.LoggingMeterRegistry [] - gauge{}
> value=10
> [logging-metrics-publisher] INFO
>  io.micrometer.core.instrument.logging.LoggingMeterRegistry [] - stat{}
> throughput=0.2/s mean=0.01s max=0.01s
> [logging-metrics-publisher] INFO
>  io.micrometer.core.instrument.logging.LoggingMeterRegistry [] - quantile{}
> throughput=0.2/s mean=32 max=32
>
> You will see a couple of differences here:
>
>   1. For counters, it appears that Micrometer is dividing the value (1) by
> the number of seconds (5), but Hadoop does not. Micrometer talk about this
> some at https://micrometer.io/docs/concepts#_counters
>   2. Hadoop Metrics2 Stat objects computes a bunch of statistics (avg,
> stddev, min, max, IntervalMin and IntervalMax), Micrometer does not
>   3. I tried to use a Micrometer DistributionSummary as a replacement for
> Hadoop Metrics2 Quantile object. It's possible I need to use a different
> object or configure it differently.
>
> Some thoughts:
>
>   1. Based on the Micrometer output, it appears that even if we can get the
> names to match (or document appropriately), users may still have to change
> their tooling based on the values that are being reported.
>   2. It's possible that we could take a different approach, where we
> continue to use Hadoop Metrics2 internally and attempt to write a
> Micrometer sink for the Metrics2 framework for 2.x and move to Micrometer
> for the next major release. Based on the Hadoop JIRA, it does not appear
> that they have plans to move away from this framework.
>
> [1] https://gist.github.com/dlmarion/67e0ed8df320633d5af23ae00d965183
>
> On Thu, Sep 23, 2021 at 1:00 PM Christopher <ct...@apache.org> wrote:
>
> > +1 to everything Ed wrote. :)
> >
> > On Wed, Sep 22, 2021 at 10:03 AM <de...@etcoleman.com> wrote:
> > >
> > > The information provided by micrometer instrumentation should be
> > consistent with the values produced by Hadoop metrics.  Things like
> gauges
> > and counters are straight forward and should match 1:1.  Things that
> > collect / calculate statics may be slightly different due to
> implementation
> > details - say the way binning for histograms is performed - they will
> still
> > be mathematically correct and the values they report should still be
> > consistent, but they might be "different".
> > >
> > > An issue with metrics is that each collection system seems to have
> > slight variations in the way they want things collected and reported.
> > Micrometer supports various monitoring systems and a way to implement
> > others if a particular system is not currently supported.  In micrometer,
> > each registry provides for converting / supporting a specific monitoring
> > system.  This includes things like name conversions, rate aggregation
> > (client vs. server) and push vs. pull. Our current metrics were named
> with
> > a specific metrics system and a naming convention - rather than trying to
> > match our current names exactly we could follow the micrometer naming
> > convention and then rely on the micrometer registry conversion to match
> the
> > user's defined collection system.
> > >
> > > Adopting and following the micrometer conventions should increase our
> > compatibility with other collection systems and ease user
> implementations.
> > In places where this might result in a name change, I think we should
> > prioritize constancy and normalizing names with conventions. That would
> > seem to provide the least surprise to end users and increase their
> > flexibility to meet their needs. We should also look to take advantage of
> > tagging to allow for aggregation and dimensional drill down to increase
> > utility to end users. To the extent that this changes a reported metric
> > name, the increased utility and flexibility provided would benefit
> > end-users.  While any name change would increase friction for current
> > metric consumers, the degree of friction seems independent of the amount
> of
> > change - any change might be disruptive.  I am not advocating that we
> > should change names just to change them - rather we should seek to
> provide
> > uniform names and consistent naming conventions across our codebase as
> > primary consideration and allow the reported names fall out from there.
> > >
> > > The configuration of each monitoring system will depend on the system
> > chosen by the user.  We should provide a select set of examples (I
> advocate
> > Prometheus, some flavor of statsd and logging) to guide users if one of
> > those do not fit their requirements and they elect to use a different
> > micrometer module / collection system.
> > >
> > > I agree that we should supply documentation mapping current names to
> > their micrometer equivalents -  the specific name reported will be
> > dependent on the conversions performed by the target system - but those
> > should be documented in each module and is not within our scope.
> > >
> > > -----Original Message-----
> > > From: Keith Turner <ke...@deenlo.com>
> > > Sent: Tuesday, September 21, 2021 5:07 PM
> > > To: Accumulo Dev List <de...@accumulo.apache.org>
> > > Subject: Re: Metrics Replacement
> > >
> > > On Tue, Sep 21, 2021 at 3:45 PM Dave Marion <dm...@gmail.com>
> wrote:
> > > >
> > > > There is a WIP pull request against 2.1.0-SNAPSHOT for replacing the
> > > > Hadoop
> > > > Metrics2 framework with Micrometer[1]. Micrometer suggests using a
> > > > naming pattern[2] for the metrics internally where words are all
> > > > lowercase separated by a period. Micrometer output formats then
> > > > rewrite the metric names to the destination specific format. It's
> > > > possible that we may not be able to produce metrics in the same exact
> > > > way as the Hadoop Metrics2
> > >
> > > Is it only the naming pattern that will cause incompatibility, or is it
> > more than that?  Like would a timer, guage, etc in micrometer produce
> > different information/metrics than a timer,gauge,etc in hadoop metrics?
> I
> > suspect these would differ and that would also impact compat.  Will the
> way
> > in which accumulo is configured to report metrics also change?  I can't
> > imagine it would be the same, but I have not looked at the PR.
> > >
> > > Can you provide an example of a naming incompat where it has to change?
> > >
> > > > framework. Metrics are not part of the public API, but we do want to
> > > > try and retain as much backwards compatibility as possible. In the
> > > > event that we cannot get that compatibility it has been suggested
> that
> > > > we document how things are different. As I have limited knowledge of
> > > > how the metrics are
> > >
> > > Is there a reasonable path to achieving compatibility?  If not, it
> seems
> > like documenting what has changed is a good way to go.  Could possibly
> > explain it in detail in the 2.1.0 release notes and have a link to that
> in
> > the user manual.
> > >
> > > > being used today, I'm looking for some feedback from the community as
> > > > to how painful it would be if metric names changed in a minor
> release.
> > > >
> > > > [1] https://micrometer.io/
> > > > [2] https://micrometer.io/docs/concepts#_naming_meters
> > >
> >
>