You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by srdo <gi...@git.apache.org> on 2018/09/17 20:52:24 UTC
[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary
GitHub user srdo opened a pull request:
https://github.com/apache/storm/pull/2840
STORM-3147: Add metrics based on ClusterSummary
Rebase and update of https://github.com/apache/storm/pull/2764.
@zd-project Please take a look when you have a chance.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srdo/storm STORM-3147
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2840.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2840
----
commit 02428594b53801572f657f4d0bff85b2213b3723
Author: Zhengdai Hu <hu...@...>
Date: 2018-07-13T19:22:20Z
STORM-3147: Port ClusterSummary to StormMetricsRegistry
commit 392803c9bb2fe81a5df57f695a0e6d7c37bd1f2e
Author: Stig Rohde Døssing <sr...@...>
Date: 2018-09-17T20:21:12Z
STORM-3147: Fix minor nits, rebase to use non-static metrics registry
----
---
[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary
Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:
https://github.com/apache/storm/pull/2840#discussion_r218485913
--- Diff: storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -4767,4 +4797,205 @@ public IScheduler getForcedScheduler() {
}
+ private static class ClusterSummaryMetrics implements MetricSet {
+ private static final String SUMMARY = "summary";
+ private final Map<String, com.codahale.metrics.Metric> metrics = new HashMap<>();
+
+ public com.codahale.metrics.Metric put(String key, com.codahale.metrics.Metric value) {
+ return metrics.put(MetricRegistry.name(SUMMARY, key), value);
+ }
+
+ @Override
+ public Map<String, com.codahale.metrics.Metric> getMetrics() {
+ return metrics;
+ }
+ }
+
+ private class ClusterSummaryMetricSet implements Runnable {
+ private static final int CACHING_WINDOW = 5;
+
+ private final ClusterSummaryMetrics clusterSummaryMetrics = new ClusterSummaryMetrics();
+
+ private final Function<String, Histogram> registerHistogram = (name) -> {
+ //This histogram reflects the data distribution across only one ClusterSummary, i.e.,
+ // data distribution across all entities of a type (e.g., data from all nimbus/topologies) at one moment.
+ // Hence we use half of the CACHING_WINDOW time to ensure it retains only data from the most recent update
+ final Histogram histogram = new Histogram(new SlidingTimeWindowReservoir(CACHING_WINDOW / 2, TimeUnit.SECONDS));
+ clusterSummaryMetrics.put(name, histogram);
+ return histogram;
+ };
+ private volatile boolean active = false;
+
+ //NImbus metrics distribution
+ private final Histogram nimbusUptime = registerHistogram.apply("nimbuses:uptime-secs");
+
+ //Supervisor metrics distribution
+ private final Histogram supervisorsUptime = registerHistogram.apply("supervisors:uptime-secs");
+ private final Histogram supervisorsNumWorkers = registerHistogram.apply("supervisors:num-workers");
+ private final Histogram supervisorsNumUsedWorkers = registerHistogram.apply("supervisors:num-used-workers");
+ private final Histogram supervisorsUsedMem = registerHistogram.apply("supervisors:used-mem");
+ private final Histogram supervisorsUsedCpu = registerHistogram.apply("supervisors:used-cpu");
+ private final Histogram supervisorsFragmentedMem = registerHistogram.apply("supervisors:fragmented-mem");
+ private final Histogram supervisorsFragmentedCpu = registerHistogram.apply("supervisors:fragmented-cpu");
+
+ //Topology metrics distribution
+ private final Histogram topologiesNumTasks = registerHistogram.apply("topologies:num-tasks");
+ private final Histogram topologiesNumExecutors = registerHistogram.apply("topologies:num-executors");
+ private final Histogram topologiesNumWorker = registerHistogram.apply("topologies:num-workers");
+ private final Histogram topologiesUptime = registerHistogram.apply("topologies:uptime-secs");
+ private final Histogram topologiesReplicationCount = registerHistogram.apply("topologies:replication-count");
+ private final Histogram topologiesRequestedMemOnHeap = registerHistogram.apply("topologies:requested-mem-on-heap");
+ private final Histogram topologiesRequestedMemOffHeap = registerHistogram.apply("topologies:requested-mem-off-heap");
+ private final Histogram topologiesRequestedCpu = registerHistogram.apply("topologies:requested-cpu");
+ private final Histogram topologiesAssignedMemOnHeap = registerHistogram.apply("topologies:assigned-mem-on-heap");
+ private final Histogram topologiesAssignedMemOffHeap = registerHistogram.apply("topologies:assigned-mem-off-heap");
+ private final Histogram topologiesAssignedCpu = registerHistogram.apply("topologies:assigned-cpu");
--- End diff --
I am sorry, but I am really having a hard time of seeing a histogram of these being valuable, but perhaps I am wrong. I would much rather see dimensions/tags being used, but since those are not available yet, I am okay with this.
---
[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary
Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:
https://github.com/apache/storm/pull/2840#discussion_r218484299
--- Diff: storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -2922,19 +2948,20 @@ public void launchServer() throws Exception {
}
});
- metricsRegistry.registerGauge("nimbus:num-supervisors", () -> state.supervisors(null).size());
- metricsRegistry.registerGauge("nimbus:fragmented-memory", this::fragmentedMemory);
- metricsRegistry.registerGauge("nimbus:fragmented-cpu", this::fragmentedCpu);
- metricsRegistry.registerGauge("nimbus:available-memory", () -> nodeIdToResources.get().values().parallelStream()
- .mapToDouble(SupervisorResources::getAvailableMem)
+ metricsRegistry.registerGauge("nimbus:total-available-memory-non-negative", () -> nodeIdToResources.get().values()
--- End diff --
There appears to be some oddness with the naming of the metrics. It would be nice to have some kind of convention for them.
Perhaps we can have something like
* 'nimbus:cluster:available-memory-mb' to show that this came from nimbus, it is about the cluster and it is the available memory in MB.
* 'nimbus:cluster:available-cpu-pct'
* 'nimbus:cluster:total-memory-mb'
* 'nimbus:cluster:total-cpu-pct'
etc.
---
[GitHub] storm issue #2840: STORM-3147: Add metrics based on ClusterSummary
Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2840
I worked with @zd-project fairly closely to come up with the goals for the metrics. I just think we can do better on some of the metrics, but it is not something we have to do right now. I am +1 for merging this in as is, but I think we will need some follow on JIRA to clean things up and document them better.
---
[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/storm/pull/2840
---
[GitHub] storm issue #2840: STORM-3147: Add metrics based on ClusterSummary
Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:
https://github.com/apache/storm/pull/2840
I'm hoping @zd-project will help review this, since this is a rebase of his PR. I don't really have the context to know whether these added metrics are useful.
---