You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Chung En Lee (Jira)" <ji...@apache.org> on 2022/03/04 09:34:00 UTC

[jira] [Assigned] (YUNIKORN-1105) Rethink memory resource conversion to MB

     [ https://issues.apache.org/jira/browse/YUNIKORN-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chung En Lee reassigned YUNIKORN-1105:
--------------------------------------

    Assignee: Chung En Lee

> Rethink memory resource conversion to MB
> ----------------------------------------
>
>                 Key: YUNIKORN-1105
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1105
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: shim - kubernetes
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Chung En Lee
>            Priority: Major
>
> The choice to represent memory in units of MB and not in bytes comes with a side effect. We convert a pod or node memory size into something that is a MB (10 based).
> Not everything is expressible in whole MB. We round up to the nearest MB. 1 byte over and we use the whole MB. This means that a node looks larger in YuniKorn than it really is. It is less than a MB but it can still cause an issue with a pod just fitting or not fitting.
> As an example: a pods needs exactly 10MB, 10,000,000 bytes. A node in YuniKorn shows 10MB free but in reality it is not 10,000,000 bytes but only 9,000,001. 
> It can also happen the other way around. The pod asks for 9,000,001 bytes, YuniKorn sees it as 10MB. The node in YuniKorn shows 9MB free but in reality the node has 9,500,000 free as a previous pod we have scheduled did not use 10MB but only 9,500,000. YuniKorn fails to place the pod, The auto scaler says there is enough room to place the pod.
> I know I am splitting hairs here but it is a real possibility. These failures are really hard to track down and link back. YuniKorn schedules the pod and the node bind fails with not enough resources or scale up fails to trigger when expected.
> With the choice of milli for cpu we have far less of an issue as K8s does not support more than 3 decimal places. In other words the smallest value used in K8s is {{1m}} for cpu.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org