You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Alessandro Bellina (JIRA)" <ji...@apache.org> on 2016/11/10 18:31:58 UTC

[jira] [Commented] (STORM-2153) New Metrics Reporting API

    [ https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654764#comment-15654764 ] 

Alessandro Bellina commented on STORM-2153:
-------------------------------------------

[~ptgoetz] I added a diagram of what I am working on now here: https://github.com/abellina/storm/blob/new_metrics_diagram/docs/new_metrics_phase_1.png

This is a bit different than previous discussion but here is how it works, let me know what you think (this is separate than reporter configuration, it is just the built in reporter), and it is a phased approach. Current phase would put still pipe metrics through StatsUtil, although I haven't written that part yet (they get to nimbus right now only).

1. Each executor registers metrics against the metric registry running in the worker.
2. With the default reporter configured, we'd instantiate that. This is a codahale ScheduledReporter that runs once/min? Currently I go every 5 seconds but that's just for testing.
3. The stats are then written to disk on a per component basis, exactly like versioned store. We could swap this part with a better way to store, but that's what I have so far. I call it TimeseriesStore to distinguish, but we could merge the two.
4. The supervisor has a timer, I call it WorkerStatsTimer that picks up stats from the stats directory on disk using its instance of TimeseriesStore. This is very much like the heartbeat stuff in the supervisor, except the Supervisor itself doesn't care much for them, it just shuttles them to the thrift connection. The stats are then deleted when pushed, nothing is kept.
5. With each iteration of WorkerStatsTimer, we send the metrics for all workers to Nimbus via Thrift.
6. For phase I, I was just thinking to take these and make them look like heartbeats stats s.t. the stats code and the UI can show them. Eventually, once the RocksDB stuff is ready we can store there.
7. I am looking to publish all data available for Timers and Histograms to see if I can reconstruct later the stats it computes, so that we can look at how to aggregate them. 

Thoughts? I am going to take a crack at trying to make this work with our current UI this week, and I should be able to share the code next week. I have to finish up what I doing, write tests and document/clean up quite a bit, but can share before.

> New Metrics Reporting API
> -------------------------
>
>                 Key: STORM-2153
>                 URL: https://issues.apache.org/jira/browse/STORM-2153
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: P. Taylor Goetz
>
> This is a proposal to provide a new metrics reporting API based on [Coda Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3cCAGX0URh85NfH0Pbph11PMc1oof6HTycjCXSxgwP2nnofuKq0pQ@mail.gmail.com%3e]  a number of community and PMC members recommended replacing Storm’s metrics system with a new API as opposed to enhancing the existing metrics system. Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in addition to potentially affecting system performance, complicates certain types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to various systems, with existing implementations for: JMX, console, CSV, SLF4J, Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any project that uses the metrics library (i.e. would have broader application outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for sending metrics data to other systems, and implementing a reporter is fairly straightforward (an example reporter implementation can be found here). For example if someone develops a reporter based on Coda Hale’s metrics, it could not only be used for pushing Storm metrics, but also for any system that used the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon completion, the old metrics API would presumably be deprecated, but kept in place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting mechanism. The proposed metrics API would not depend on any of Storm's messaging capabilities and instead use the [metrics library's built-in reporter mechanism | http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This would allow users to use existing {{Reporter}} implementations which are not Storm-specific, and would simplify the process of collecting metrics. Compared to Storm's {{IMetricCollector}} interface, implementing a reporter for the metrics library is much more straightforward (an example can be found [here | https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the ZooKeeper-based metrics used by Storm UI.
> h2. Relationship to JStorm Metrics
> [TBD]
> h2. Target Branches
> [TBD]
> h2. Performance Implications
> [TBD]
> h2. Metrics Namespaces
> [TBD]
> h2. Metrics Collected
> *Worker*
> || Namespace || Metric Type || Description ||
> *Nimbus*
> || Namespace || Metric Type || Description ||
> *Supervisor*
> || Namespace || Metric Type || Description ||
> h2. User-Defined Metrics
> [TBD]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)