You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Wilfred Spiegelenburg (Jira)" <ji...@apache.org> on 2020/03/26 04:00:00 UTC

[jira] [Commented] (YUNIKORN-14) Add rest API to retrieve app/container history info

    [ https://issues.apache.org/jira/browse/YUNIKORN-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067342#comment-17067342 ] 

Wilfred Spiegelenburg commented on YUNIKORN-14:
-----------------------------------------------

I looked at the PR and want to propose a different approach as I see a number of issues.
I have mentioned tracking applications details in the text but I am not sure if that is needed in the first instance. It would still fit in the design if we want to add that in the second step.

History should be part of {{common}} or the {{scheduler}} not the {{cache}} I think. I would expect that we have multiple generic collectors that can collect history data. One generic collector is started per partition like the {{PartitionManager}} in its own go routine. History and all tracking is always per partition and will not go over that level at any point.

The current implementation uses a pull mechanism to collect the data from the partition. That requires locking the partition on retrieval (locks are missing currently in the solution) and could thus impact scheduling performance if the web interface gets lots of requests. We should not need to impact the partition to retrieve the history. The data should be kept in the collector and retrieved from there.

A change going deeper: why is the history just getting top level partition data? Getting info out for queues or nodes is as important going forward. I also see an omission here: we lose history data as soon as we remove the partition. It will thus not show us real history for a time period just the history for the current state going back a fixed time. That would become even more important when we look at queues, nodes or applications. If we go forward we need to be able to track and maintain the history data for a period of time independent of the removal of the partition/node/queue/application.

Tracking history should not be limited by the number of entries but by time range that we need to keep (24 hours as an example). Having a history per minute is what we need at least. Maybe we even need to go to a 30 or 15 second split. Longer periods means we could too easily miss short running containers or applications. The other solution would be to use a push from the different tracked objects into a channel that is read by the history collector. That would mean we do not miss info but the implementation becomes a bit trickier. We can still sum up to give stats per time range but that would then become easier to manage for small intervals. That would also not be "on demand" but based on an internal timing of the history collector.
All changes for things we need to track run through the partition info already so we would just need to instrument one object to keep track of all these things.

Thoughts?

> Add rest API to retrieve app/container history info
> ---------------------------------------------------
>
>                 Key: YUNIKORN-14
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-14
>             Project: Apache YuniKorn
>          Issue Type: New Feature
>          Components: core - scheduler
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Adam Antal
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Yunikorn_UI.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of the web UI we can show application and container history.
> The current pages are mocked up and do not show the real history. Before the changes can be made on the web UI side we need to provide the history via a REST interface so it can be consumed by the UI.
> All web service code is located in package [https://github.com/apache/incubator-yunikorn-core/tree/master/pkg/webservice]. When running the scheduler locally (from K8shim using "make run"), the REST APIs can be accessed via
>  * [http://localhost:9080/ws/v1/apps]
>  * [http://localhost:9080/ws/v1/queues]
>  * [http://localhost:9080/ws/v1/nodes]
> We need to add another endpoint to provide data to yunikorn-web to render the app/container history page. Please check with [~akhilpb] for the desired data format, etc. That issue is tracked via YUNIKORN-8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org