You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by srdo <gi...@git.apache.org> on 2018/09/17 20:52:24 UTC

[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary

GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/2840

    STORM-3147: Add metrics based on ClusterSummary

    Rebase and update of https://github.com/apache/storm/pull/2764.
    
    @zd-project Please take a look when you have a chance.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-3147

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2840
    
----
commit 02428594b53801572f657f4d0bff85b2213b3723
Author: Zhengdai Hu <hu...@...>
Date:   2018-07-13T19:22:20Z

    STORM-3147: Port ClusterSummary to StormMetricsRegistry

commit 392803c9bb2fe81a5df57f695a0e6d7c37bd1f2e
Author: Stig Rohde Døssing <sr...@...>
Date:   2018-09-17T20:21:12Z

    STORM-3147: Fix minor nits, rebase to use non-static metrics registry

----


---

[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2840#discussion_r218485913
  
    --- Diff: storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
    @@ -4767,4 +4797,205 @@ public IScheduler getForcedScheduler() {
     
         }
     
    +    private static class ClusterSummaryMetrics implements MetricSet {
    +        private static final String SUMMARY = "summary";
    +        private final Map<String, com.codahale.metrics.Metric> metrics = new HashMap<>();
    +        
    +        public com.codahale.metrics.Metric put(String key, com.codahale.metrics.Metric value) {
    +            return metrics.put(MetricRegistry.name(SUMMARY, key), value);
    +        }
    +
    +        @Override
    +        public Map<String, com.codahale.metrics.Metric> getMetrics() {
    +            return metrics;
    +        }
    +    }
    +    
    +    private class ClusterSummaryMetricSet implements Runnable {
    +        private static final int CACHING_WINDOW = 5;
    +        
    +        private final ClusterSummaryMetrics clusterSummaryMetrics = new ClusterSummaryMetrics();
    +        
    +        private final Function<String, Histogram> registerHistogram = (name) -> {
    +            //This histogram reflects the data distribution across only one ClusterSummary, i.e.,
    +            // data distribution across all entities of a type (e.g., data from all nimbus/topologies) at one moment.
    +            // Hence we use half of the CACHING_WINDOW time to ensure it retains only data from the most recent update
    +            final Histogram histogram = new Histogram(new SlidingTimeWindowReservoir(CACHING_WINDOW / 2, TimeUnit.SECONDS));
    +            clusterSummaryMetrics.put(name, histogram);
    +            return histogram;
    +        };
    +        private volatile boolean active = false;
    +
    +        //NImbus metrics distribution
    +        private final Histogram nimbusUptime = registerHistogram.apply("nimbuses:uptime-secs");
    +
    +        //Supervisor metrics distribution
    +        private final Histogram supervisorsUptime = registerHistogram.apply("supervisors:uptime-secs");
    +        private final Histogram supervisorsNumWorkers = registerHistogram.apply("supervisors:num-workers");
    +        private final Histogram supervisorsNumUsedWorkers = registerHistogram.apply("supervisors:num-used-workers");
    +        private final Histogram supervisorsUsedMem = registerHistogram.apply("supervisors:used-mem");
    +        private final Histogram supervisorsUsedCpu = registerHistogram.apply("supervisors:used-cpu");
    +        private final Histogram supervisorsFragmentedMem = registerHistogram.apply("supervisors:fragmented-mem");
    +        private final Histogram supervisorsFragmentedCpu = registerHistogram.apply("supervisors:fragmented-cpu");
    +
    +        //Topology metrics distribution
    +        private final Histogram topologiesNumTasks = registerHistogram.apply("topologies:num-tasks");
    +        private final Histogram topologiesNumExecutors = registerHistogram.apply("topologies:num-executors");
    +        private final Histogram topologiesNumWorker = registerHistogram.apply("topologies:num-workers");
    +        private final Histogram topologiesUptime = registerHistogram.apply("topologies:uptime-secs");
    +        private final Histogram topologiesReplicationCount = registerHistogram.apply("topologies:replication-count");
    +        private final Histogram topologiesRequestedMemOnHeap = registerHistogram.apply("topologies:requested-mem-on-heap");
    +        private final Histogram topologiesRequestedMemOffHeap = registerHistogram.apply("topologies:requested-mem-off-heap");
    +        private final Histogram topologiesRequestedCpu = registerHistogram.apply("topologies:requested-cpu");
    +        private final Histogram topologiesAssignedMemOnHeap = registerHistogram.apply("topologies:assigned-mem-on-heap");
    +        private final Histogram topologiesAssignedMemOffHeap = registerHistogram.apply("topologies:assigned-mem-off-heap");
    +        private final Histogram topologiesAssignedCpu = registerHistogram.apply("topologies:assigned-cpu");
    --- End diff --
    
    I am sorry, but I am really having a hard time of seeing a histogram of these being valuable, but perhaps I am wrong.  I would much rather see dimensions/tags being used, but since those are not available yet, I am okay with this.


---

[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2840#discussion_r218484299
  
    --- Diff: storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
    @@ -2922,19 +2948,20 @@ public void launchServer() throws Exception {
                                             }
                                         });
     
    -            metricsRegistry.registerGauge("nimbus:num-supervisors", () -> state.supervisors(null).size());
    -            metricsRegistry.registerGauge("nimbus:fragmented-memory", this::fragmentedMemory);
    -            metricsRegistry.registerGauge("nimbus:fragmented-cpu", this::fragmentedCpu);
    -            metricsRegistry.registerGauge("nimbus:available-memory", () -> nodeIdToResources.get().values().parallelStream()
    -                .mapToDouble(SupervisorResources::getAvailableMem)
    +            metricsRegistry.registerGauge("nimbus:total-available-memory-non-negative", () -> nodeIdToResources.get().values()
    --- End diff --
    
    There appears to be some oddness with the naming of the metrics.  It would be nice to have some kind of convention for them.
    
    Perhaps we can have something like 
    
     * 'nimbus:cluster:available-memory-mb' to show that this came from nimbus, it is about the cluster and it is the available memory in MB.
     * 'nimbus:cluster:available-cpu-pct'
     * 'nimbus:cluster:total-memory-mb'
     * 'nimbus:cluster:total-cpu-pct'
    
    etc.



---

[GitHub] storm issue #2840: STORM-3147: Add metrics based on ClusterSummary

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2840
  
    I worked with @zd-project fairly closely to come up with the goals for the metrics.  I just think we can do better on some of the metrics, but it is not something we have to do right now.  I am +1 for merging this in as is, but I think we will need some follow on JIRA to clean things up and document them better.


---

[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/2840


---

[GitHub] storm issue #2840: STORM-3147: Add metrics based on ClusterSummary

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/2840
  
    I'm hoping @zd-project will help review this, since this is a rebase of his PR. I don't really have the context to know whether these added metrics are useful.


---