You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Peter Bacsko (Jira)" <ji...@apache.org> on 2022/06/23 14:20:00 UTC
[jira] [Commented] (YUNIKORN-1233) REST api shows negative request time for some allocations
[ https://issues.apache.org/jira/browse/YUNIKORN-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558109#comment-17558109 ]
Peter Bacsko commented on YUNIKORN-1233:
----------------------------------------
I looked at this in a bit more detail.
When an {{AllocationAsk}} is created as part of the normal flow, the call hierarchy is {{ClusterContext.handleRMUpdateAllocationEvent()}} ->{{ClusterContext.processAsks()}} ->{{PartitionContext.addAllocationAsk()}} -> {{NewAllocationAsk()}}. This always sets the {{createTime}} variable.
Unfortunately, this information is not saved anywhere, so during recovery, we reconstruct these objects differently: {{ClusterContext.addNode()}} -> {{ClusterContext.convertAllocations()}} -> {{NewAllocationFromSI()}}. In this function, we directly create the object with struct initialization and {{createTime}} is unset.
Right now, the only solution I can see is setting {{createTime}} to the {{creationTime}} of the pod. This will result in having 0 allocationDelays everywhere, so it's far from perfect. Maybe a WARN message during recovery would be helpful to indicate that after a recovery, this data is lost and we just approximate it. Otherwise we have to save it somewhere which is more complicated and I don't think it's worth it.
> REST api shows negative request time for some allocations
> ---------------------------------------------------------
>
> Key: YUNIKORN-1233
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1233
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - common
> Affects Versions: 1.0.0
> Reporter: Wilfred Spiegelenburg
> Assignee: Peter Bacsko
> Priority: Major
>
> In a dump taken from a install after 1.0 we show negative requestTime entries for a small set of pods:
> {code:java}
> {
> "allocationKey": "0c381096-2ec2-45e3-ac5f-29b4bfdd82b2",
> "allocationTags": {
> ...
> "kubernetes.io/label/disableStateAware": "true",
> "kubernetes.io/label/queue": "root.default",
> "kubernetes.io/meta/namespace": "default",
> ...
> "yunikorn.apache.org/requiredNode": "ip-172-31-214-45.ec2.internal"
> },
> "requestTime": -6795364578871345152,
> "allocationTime": 1654635352762302689,
> "allocationDelay": 8449999931633647841,
> "uuid": "0c381096-2ec2-45e3-ac5f-29b4bfdd82b2",
> "resource": {
> "memory": 52428800,
> "vcore": 50
> },
> "priority": "0",
> "queueName": "",
> "nodeId": "ip-172-31-214-45.ec2.internal",
> "applicationId": "yunikorn-default-autogen",
> "partition": "default",
> "placeholder": false,
> "placeholderUsed": false,
> "taskGroupName": ""
> },
> {code}
> Could be related to the fact that they are daemonset pods as they have the tag "yunikorn.apache.org/requiredNode" set
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org