You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@yunikorn.apache.org by "Weiwei Yang (Jira)" <ji...@apache.org> on 2021/05/11 20:47:00 UTC

[jira] [Resolved] (YUNIKORN-646) Add metrics implementation: "allocating_latency_seconds"

     [ https://issues.apache.org/jira/browse/YUNIKORN-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weiwei Yang resolved YUNIKORN-646.
----------------------------------
    Fix Version/s: 0.11
       Resolution: Fixed

> Add metrics implementation: "allocating_latency_seconds"
> --------------------------------------------------------
>
>                 Key: YUNIKORN-646
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-646
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - common
>            Reporter: Chenya Zhang
>            Assignee: Chenya Zhang
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.11
>
>
> Observation:
>  # Container allocating latency stays at 0. The number of allocation attempts fluctuates normally.
>  # Scheduler metrics definition is not consistent and sometimes hard to understand.
> Root cause analysis:
>  # The metrics "allocating_latency_seconds" is not fully implemented or the implementation is missed in recent releases. For example, ObserveSchedulingLatency() is currently not called when allocating containers.
>  # Scheduler metrics is implemented by multiple developers in the past while not following the same convention.
> Improvement Plan:
>  # The top level container allocation latency can be captured by the main scheduling routine in {{scheduler/context.go}}. Reason: The {{schedule()}} method in {{scheduler/context.go}} is the entry point to process each partition in the scheduler, walk over each queue and app to check if anything can be scheduled.
>  # The metrics name "allocating_latency_seconds" can be changed to "scheduling_latency_seconds". Reason: The metrics is initially defined as "schedulingLatency" in {{metrics/scheduler.go}}. Naming consistency can help to avoid confusion.
>  # Other metrics definition and help message can be improved to make {{metrics/scheduler.go}} consistent. (Open to create a separate PR for the refactoring work.)
>  # New metrics can be further added to monitor lower level latency when the scheduler is iterating over partition list, queues, applications, requests etc. Not included in this PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org