You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Morgan Geldenhuys <mo...@tu-berlin.de> on 2020/02/18 16:00:51 UTC

Identifying Flink Operators of the Latency Metric

Hi All,

I have setup monitoring for Flink (1.9.2) via Prometheus and am 
interested in viewing the end-to-end latency at the sink operators for 
the 95 percentile. I have enabled latency markers at the operator level 
and can see the results, one of the entries looks as follows:

flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="flink",component="taskmanager",host="flink_taskmanager_6bdc8fc49_kr4bs",instance="10.244.18.2:9999",job="kubernetes-pods",job_id="96d32d8e380dc267bd69403fd7e20adf",job_name="Traffic",kubernetes_namespace="default",kubernetes_pod_name="flink-taskmanager-6bdc8fc49-kr4bs",operator_id="2e32dc82f03b1df764824a4773219c97",operator_subtask_index="7",pod_template_hash="6bdc8fc49",quantile="0.95",source_id="cbc357ccb763df2852fee8c4fc7d55f2",tm_id="7fb02c0ed734ed1815fa51373457434f"}

That is great, however... I am unable to determine which of the 
operators is the sink operator I'm looking for based solely on the 
operator_id. Is there a way of determining this?

Regards,
M.

Re: Identifying Flink Operators of the Latency Metric

Posted by Robert Metzger <rm...@apache.org>.
Hey Morgan,

I would query the Monitoring REST API:
https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html

For example:
GET http://localhost:8082/jobs/9a6748889bf24987495eead247aeb1ff
Returns:

   1. {jid: "9a6748889bf24987495eead247aeb1ff", name:
   "CarTopSpeedWindowingExample", isStoppable: false,…}
      1. jid: "9a6748889bf24987495eead247aeb1ff"
      2. name: "CarTopSpeedWindowingExample"
      3. isStoppable: false
      4. state: "RUNNING"
      5. start-time: 1582192403413
      6. end-time: -1
      7. duration: 18533
      8. now: 1582192421946
      9. timestamps: {FINISHED: 0, FAILING: 0, CANCELED: 0, SUSPENDED: 0,
      RUNNING: 1582192403550, RECONCILING: 0, FAILED: 0,…}
      10. vertices: [{id: "cbc357ccb763df2852fee8c4fc7d55f2", name:
      "Source: Custom Source -> Timestamps/Watermarks",…},…]
         1. 0: {id: "cbc357ccb763df2852fee8c4fc7d55f2", name: "Source:
         Custom Source -> Timestamps/Watermarks",…}
            1. id: "cbc357ccb763df2852fee8c4fc7d55f2"
            2. name: "Source: Custom Source -> Timestamps/Watermarks"
            3. parallelism: 1
            4. status: "RUNNING"
            5. start-time: 1582192403754
            6. end-time: -1
            7. duration: 18192
            8. tasks: {CREATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0,
            CANCELING: 0, DEPLOYING: 0, RUNNING: 1,…}
            9. metrics: {read-bytes: 0, read-bytes-complete: true,
            write-bytes: 0, write-bytes-complete: true, read-records: 0,…}
         2. 1: {id: "90bea66de1c231edf33913ecd54406c1",…}
            1. id: "90bea66de1c231edf33913ecd54406c1"
            2. name: "Window(GlobalWindows(), DeltaTrigger, TimeEvictor,
            ComparableAggregator, PassThroughWindowFunction) -> Sink:
Print to Std. Out
            "
            3. parallelism: 1
            4. status: "RUNNING"
            5. start-time: 1582192403759
            6. end-time: -1
            7. duration: 18187
            8. tasks: {CREATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0,
            CANCELING: 0, DEPLOYING: 0, RUNNING: 1,…}
            9. metrics: {read-bytes: 4669, read-bytes-complete: true,
            write-bytes: 0, write-bytes-complete: true,…}
         11. status-counts: {CREATED: 0, CANCELED: 0, RECONCILING: 0,
      FAILED: 0, CANCELING: 0, DEPLOYING: 0, RUNNING: 2,…}
      12. plan: {jid: "9a6748889bf24987495eead247aeb1ff", name:
      "CarTopSpeedWindowingExample",…}


On Tue, Feb 18, 2020 at 5:01 PM Morgan Geldenhuys <
morgan.geldenhuys@tu-berlin.de> wrote:

> Hi All,
>
> I have setup monitoring for Flink (1.9.2) via Prometheus and am interested
> in viewing the end-to-end latency at the sink operators for the 95
> percentile. I have enabled latency markers at the operator level and can
> see the results, one of the entries looks as follows:
>
>
> flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="flink",component="taskmanager",host="flink_taskmanager_6bdc8fc49_kr4bs",instance="
> 10.244.18.2:9999
> ",job="kubernetes-pods",job_id="96d32d8e380dc267bd69403fd7e20adf",job_name="Traffic",kubernetes_namespace="default",kubernetes_pod_name="flink-taskmanager-6bdc8fc49-kr4bs",operator_id="2e32dc82f03b1df764824a4773219c97",operator_subtask_index="7",pod_template_hash="6bdc8fc49",quantile="0.95",source_id="cbc357ccb763df2852fee8c4fc7d55f2",tm_id="7fb02c0ed734ed1815fa51373457434f"}
>
> That is great, however... I am unable to determine which of the operators
> is the sink operator I'm looking for based solely on the operator_id. Is
> there a way of determining this?
>
> Regards,
> M.
>