You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/08/12 12:44:20 UTC

[jira] [Commented] (FLINK-4389) Expose metrics to Webfrontend

    [ https://issues.apache.org/jira/browse/FLINK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418767#comment-15418767 ] 

ASF GitHub Bot commented on FLINK-4389:
---------------------------------------

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/2363

    [FLINK-4389] Expose metrics to WebFrontend

    This PR exposes metrics to the Webfrontend, as proposed in [FLIP-7](https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface).
    
    This PR builds on-top of #2300, meaning that 2866f56 is not part of the PR.
    
    I've split the implementation into 5 commits that implement
    * the generation of a separate scope string for the WebInterface
    * the MetricQueryService, a separate actor running on all Job-/TaskManagers whose main purpose is to create and return a dump of the metrics when queried to do so
    * the MetricStore, a nested data structure used in the WebInterface to store transmitted metrics
    * the MetricFetcher, which is used by the WebInterface to fetch metrics from Job-/TaskManagers
    * various MetricsHandler classes, which handle REST calls requesting specific metrics
    
    ### MetricQueryService
    The MetricQueryService is an actor running inside the MetricRegistry acting like an unscheduled reporter that is queried from the outside for a report. The MetricRegistry notifies it of added/removed metrics whereas the MetricFetcher sends report requests to the JM/TM which are then forwarded to the MetricQueryService, which answers directly to the MetricFetcher.
    
    The report is one big `Object[]`, which contains for each metric
     1. the type of the metric, encoded as a byte (so that we know how many values are transmitted)
     2. the fully qualified metric name (based on the separate format)
     3. the value(s) of the metric (turned into Strings for Gauges)
    
    ### MetricStore
    The MetricStore is a relatively simple nested data-structure that contains one HashMap<String, Object> for every JM/TM/job/task. Received metrics are added to these HashMaps based on the format string. There is only a single MetricStore instance in the WebInterface.
    
    ### MetricFetcher
    The MetricFetcher initiates the transfer and cleanup of metrics. It contains the MetricStore instance, which is accessed by MetricHandlers. The fetching is only done when a handler asks for it, with a minimum duration of 10 seconds between updates. As such no fetching will be done if the metrics are not accessed with REST calls.
    
    The fetching procedure can be summed up in pseudo-code as following:
    ```
    fetch():
    	askJobManagerForJobDetails()
    		=> retain all metrics belonging to the given jobs
    	askJobManagerForMetrics()
    		=> add received metrics to MetricStore
    	askJobManagerForRegisteredTaskManagers()
    		=> retain all metrics belonging to registered task managers
    		=> for each TaskManager:
    			askTaskManagerForMetrics()
    				=> add received metrics to MetricStore
    ```
    
    ### MetricsHandler
    The MetricsHandlers deal with two requests:
    * getAllAvailableMetrics - any REST request that does not have a `get` query parameter is treated as a request for all available metrics for a given JM/TM/job/task, denoted by the REST path. The reply will be a JSON array, for example: `[{"id":"metric_1"},{"id":"metric_2"}]`
    * getMetricValues - the Webfrontend can request the values for several metrics by passing a comma-separated list of metric id's as the `get` query parameter. The reply will be a JSON array of id:value pairs, for example: `[{"id":"metric_1", "value":"4"}]` or an empty string if an error occurred.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 4389_metrics_exposed

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2363
    
----
commit ea0e4d892717f042acf26ec9653a2371d7b21028
Author: zentol <ch...@apache.org>
Date:   2016-07-27T09:25:27Z

    [FLINK-4245] Expose all defined variables

commit ea1154644566f8009ccda64a0acbdde7d59ad235
Author: zentol <ch...@apache.org>
Date:   2016-08-05T11:54:37Z

    Implement Query Scope
    
    Modifies various MetricGroups to return a separate scope for the query service.

commit 3791a94529d703351dffb284ed3d5d19f1ce272c
Author: zentol <ch...@apache.org>
Date:   2016-08-05T11:49:10Z

    Implement MetricQueryService
    
    Used on the JM/TM to create a key-value representation of all metrics.

commit a0e1418decc8a3a4b53da15dc744f1702247db9f
Author: zentol <ch...@apache.org>
Date:   2016-08-05T11:48:06Z

    Implement MetricStore
    
    Data structure used in the WebInterface to store the transmitted metrics.

commit 2bab6cc32c139f5969a276e385ed5afd6c6a46ea
Author: zentol <ch...@apache.org>
Date:   2016-08-08T12:52:01Z

    Implement MetricFetcher
    
    The MetricFetcher regularly fetches metrics from the JM and all TM's.

commit de4aeaf1e0958b49531adae198345b87ccd260bd
Author: zentol <ch...@apache.org>
Date:   2016-08-05T11:48:22Z

    Implement various MetricsHandler
    
    Handlers that answers metric related queries.

----


> Expose metrics to Webfrontend
> -----------------------------
>
>                 Key: FLINK-4389
>                 URL: https://issues.apache.org/jira/browse/FLINK-4389
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Metrics, Webfrontend
>    Affects Versions: 1.1.0
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>             Fix For: pre-apache
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-7%3A+Expose+metrics+to+WebInterface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)