You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/10/09 02:28:28 UTC

[jira] [Updated] (STORM-147) UI should use metrics framework

     [ https://issues.apache.org/jira/browse/STORM-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-147:
-------------------------------
    Component/s: storm-core

> UI should use metrics framework
> -------------------------------
>
>                 Key: STORM-147
>                 URL: https://issues.apache.org/jira/browse/STORM-147
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/612
> If I understand correctly, the stats framework is deprecated in favor of the metrics framework. However, the UI currently relies on the older stats framework, and so there are duplicated calls to the stats code all through the critical loops, and several interesting numbers gathered only by the metrics framework (heap info, etc) absent from the UI.
> A CompileAndZookeepMetrics consumer could listen on the metrics stream, assemble data objects that look the same as what the stats framework produces, and serialize them into zookeeper. That lets us remove the stats tracking calls from the executor and makes it easier to add new metrics to the UI, yet doesn't require changes to the underlying UI code. Also, anyone else could use zookeeper or thrift to retrieve that synthesized view of the cluster metrics.
> My thought is to have one metrics compiler per executor and one per worker. Each compiler would maintain a composite object and update it as new metrics roll in. As a new value for say the emitted count is received, it updates that field in-place, leaving all other last-known values. The compiler would clear out its data object on cleanup().
> In the current implementation, the workerbeat has information about the worker and all stats rolled up into a single object. We can have the response of the current get*Info() thrift calls stay the same, but there would be an increase in number of zookeeper calls to build it.
> If this is a welcome feature, I believe @arrawatia is excited to implement it.
> Data objects stored in ZK: one per worker and one per executor? one per worker? or one per compiled metric?
> What tempo should the compiler push its compiled view to Zookeeper: on each metrics update, or on a heartbeat?
> (This may be a relative of #527)
> ----------
> nathanmarz: Yes, I would like to see this work done. I think the best would be:
> One Zookeeper metrics consumer per worker
> All metrics stats get routed to local Zookeeper metrics consumer (should make an explicit localGrouping for this that errors if that executor is not there)
> That metrics consumer updates a single node in ZK representing stats for that worker, the same way it works now.
> It should update ZK after it receives N updates, where N is the number of executors in that worker. That will keep the tempo at approximately the same rate as metrics are emitted.
> ----------
> mrflip: possible:
> make a MetricsZkSummarizer that populates a thrift-generated object with metrics and serializes them back to zookeeper
> make a subclass SystemZkSummarizer for the specific purpose here
> it sends updates on a tempo of one report per producer
> make the UI work with new object
> beautiful:
> Add fields for other interesting numbers to the worker and executor summaries, such as GC and disruptor queues
> display those interesting numbers on UI
> fast:
> make a localGrouping, just like the localOrShuffleGrouping, but which errors rather than doing a shuffle
> In--- system-topology! (common.clj), add an add-system-zk-summarizer! method to attach the SystemZkSummarizer
> currently, the metrics consumers are attached always with the :shuffle grouping (metrics-consumer-bolt-specs). Modify this to get the grouping from the MetricsConsumer instead.
> btw -- would the default grouping of a MetricsConsumer be better of as :local-or-shuffle, not :shuffle? There doesn't seem to be a reason to deport metrics if a consumer bolt is handy.
> ----------
> nathanmarz: Since metrics are written per worker, Storm should actually guarantee that there's one ZK metrics executor per worker. And the reason it should be :local instead of :local-or-shuffle is because of that guarantee – if that executor isn't local then there's a serious problem and there should be an error. The ZK metrics executor should be spawned like the SystemBolt in order to get the one per worker guarantee, and ensure that if the number of workers changes the number of ZKMetrics executors change appropriately.
> ----------
> mrflip: Understood -- my question at the end regarded other MetricsConsumers, not this special one: right now JoeBobMetricConsumer gets shuffle grouping, but I was wondering if it should get local-or-shuffle instead. 
> The SystemZkSummarizer must be local-or-die, and created specially at the same lifecycle as the system bolt.
> ----------
> nathanmarz: Ah, well we should make the type of grouping configurable. fieldsGrouping on executor id is probably the most logical default.
> ----------
> mrflip: (Addresses #27 )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)