You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Wilfred Spiegelenburg (Jira)" <ji...@apache.org> on 2022/03/16 04:33:00 UTC
[jira] [Updated] (YUNIKORN-1121) MockScheduler addTask ignores resource settings

     [ https://issues.apache.org/jira/browse/YUNIKORN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wilfred Spiegelenburg updated YUNIKORN-1121:
--------------------------------------------
    Description: 
Reviewing YUNIKORN-1105 I found a bug in the mock scheduler…

Looking through the changes I saw files being changed that I thought would not require any changes. I ran the tests and they failed without the change. I was wondering why we were seeing those failures. I ran the tests in the debugger without the changes that I thought were unneeded and saw weird things.
The problem is here:
{code:java}
func (fc *MockScheduler) addTask(appID string, taskID string, ask *si.Resource){code}
Is hopelessly broken. The ask that gets passed in is completely ignored. That means every task that was created always was interpreted as a {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now that we fixed things it gets set to 1,000,000 or the real 1M.
The breakage is triggered by the function in the resource code which does the right thing:
{code:java}
func GetPodResource(pod *v1.Pod) (resource *si.Resource){code}
In the old setup as long as the memory for best effort (i.e. 1) was smaller than the resource set for the task things would just pass without an issue. Since 1 was the smallest possible it would always work. Accounting on nodes etc was most likely way off but none of these tests checked that anyway.

This causes *all* tests that use resources within a Task using the mock scheduler to not test the real thing, not even close.
It also hinders us from testing failure cases. We can never create a task that does not fit on a node as an example unless the node is full.

  was:
Reviewing YUNIKORN-1105 I found another bug in the mock scheduler…
Looking through the changes I saw files being changed that I thought would not require any changes. I ran the tests and they failed without the change. I was wondering why we were seeing those failures. I ran the tests in the debugger without the changes that I thought were unneeded and saw weird things.
The problem is here:
func (fc *MockScheduler) addTask(appID string, taskID string, ask *si.Resource)
Is hopelessly broken. The ask that gets passed in is completely ignored. That means every task that was created always was interpreted as a {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now that we fixed things it gets set to 1,000,000 or the real 1M.
The breakage is triggered by the function in the resource code which does the right thing:
func GetPodResource(pod *v1.Pod) (resource *si.Resource)
In the old setup as long as the memory for best effort (i.e. 1) was smaller than the resource set for the task things would just pass without an issue. Since 1 was the smallest possible it would always work. Accounting on nodes etc was most likely way off but none of these tests checked that anyway.
This causes *all* tests that use resources within a Task using the mock scheduler to not test the real thing, not even close.
It also hinders us from testing failure cases. We can never create a task that does not fit on a node as an example unless the node is full.


> MockScheduler addTask ignores resource settings
> -----------------------------------------------
>
>                 Key: YUNIKORN-1121
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1121
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>            Reporter: Wilfred Spiegelenburg
>            Priority: Major
>              Labels: newbie
>
> Reviewing YUNIKORN-1105 I found a bug in the mock scheduler…
> Looking through the changes I saw files being changed that I thought would not require any changes. I ran the tests and they failed without the change. I was wondering why we were seeing those failures. I ran the tests in the debugger without the changes that I thought were unneeded and saw weird things.
> The problem is here:
> {code:java}
> func (fc *MockScheduler) addTask(appID string, taskID string, ask *si.Resource){code}
> Is hopelessly broken. The ask that gets passed in is completely ignored. That means every task that was created always was interpreted as a {{_PodQOSBestEffort_}} __ and got memory set to 1 which used to be 1M. Now that we fixed things it gets set to 1,000,000 or the real 1M.
> The breakage is triggered by the function in the resource code which does the right thing:
> {code:java}
> func GetPodResource(pod *v1.Pod) (resource *si.Resource){code}
> In the old setup as long as the memory for best effort (i.e. 1) was smaller than the resource set for the task things would just pass without an issue. Since 1 was the smallest possible it would always work. Accounting on nodes etc was most likely way off but none of these tests checked that anyway.
> This causes *all* tests that use resources within a Task using the mock scheduler to not test the real thing, not even close.
> It also hinders us from testing failure cases. We can never create a task that does not fit on a node as an example unless the node is full.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org