You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Chesnay Schepler (Jira)" <ji...@apache.org> on 2021/01/29 10:36:00 UTC

[jira] [Assigned] (FLINK-11742) Push metrics to Pushgateway without "instance"

     [ https://issues.apache.org/jira/browse/FLINK-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chesnay Schepler reassigned FLINK-11742:
----------------------------------------

    Assignee:     (was: Tom Goong)

> Push metrics to Pushgateway without "instance"
> ----------------------------------------------
>
>                 Key: FLINK-11742
>                 URL: https://issues.apache.org/jira/browse/FLINK-11742
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics
>            Reporter: Tom Goong
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2019-02-25-17-16-28-618.png, image-2019-02-25-17-16-59-034.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to the official article,
> [https://prometheus.io/docs/concepts/jobs_instances/]
> [https://github.com/prometheus/pushgateway]
> when sending a metric to Prometheus Pushgateway, you need to give an "instance" message.
>  In actual use, after there is no "instance", Prometheus stores metrics with problems, metrics are not continuous, and a lot of data is lost. After adding instance, it returns to normal.
>  
> no "instance" 
> !image-2019-02-25-17-16-28-618.png!
>  
> with "instance"
> !image-2019-02-25-17-16-59-034.png!
>  
>  
> {quote}In Prometheus terms, an endpoint you can scrape is called an instance, usually corresponding to a single process. A collection of instances with the same purpose, a process replicated for scalability or reliability for example, is called a job.
> {quote}
> {quote}For example, an API server job with four replicated instances:
> job: api-server
> -- instance 1: 1.2.3.4:5670
> -- instance 2: 1.2.3.4:5671
> -- instance 3: 5.6.7.8:5670
> -- instance 4: 5.6.7.8:5671
> {quote}
> [https://prometheus.io/docs/concepts/jobs_instances/#jobs-and-instances]
> I think a Flink job corresponds to a Prometheus job, and taskmanager and jobmanager correspond to different instances. If the jobName is used as the instance label, the same metrics of different tasksmanages will conflict, and operations such as sum will fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)