You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by "zhuqi-lucas (via GitHub)" <gi...@apache.org> on 2023/06/07 10:37:08 UTC
[GitHub] [yunikorn-core] zhuqi-lucas opened a new pull request, #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
zhuqi-lucas opened a new pull request, #561:
URL: https://github.com/apache/yunikorn-core/pull/561
### What is this PR for?
The TestTryAllocatePreemptNode fails sometimes with a different allocation being preempted than the expected alloc1:
```
2023-06-07T12:40:35.035+1000 INFO objects/queue.go:139 configured queue added to scheduler {"queueName": "root"}
2023-06-07T12:40:35.035+1000 INFO objects/queue.go:1190 updating root queue max resources {"current max": "nil resource", "new max": "map[first:40]"}
2023-06-07T12:40:35.035+1000 DEBUG objects/queue.go:850 new parent queue inheriting template from parent queue {"child queue": "root.parent", "parent queue": "root"}
2023-06-07T12:40:35.035+1000 INFO objects/queue.go:139 configured queue added to scheduler {"queueName": "root.parent"}
2023-06-07T12:40:35.036+1000 DEBUG objects/queue.go:343 max resources setting ignored: cannot set zero max resources
2023-06-07T12:40:35.036+1000 DEBUG objects/queue.go:350 guaranteed resources setting ignored: cannot set zero max resources
2023-06-07T12:40:35.036+1000 INFO objects/queue.go:139 configured queue added to scheduler {"queueName": "root.unlimited"}
2023-06-07T12:40:35.036+1000 DEBUG objects/queue.go:343 max resources setting ignored: cannot set zero max resources2023-06-07T12:40:35.036+1000 INFO objects/queue.go:139 configured queue added to scheduler {"queueName": "root.parent.child1"}
2023-06-07T12:40:35.036+1000 DEBUG objects/queue.go:343 max resources setting ignored: cannot set zero max resources
2023-06-07T12:40:35.036+1000 INFO objects/queue.go:139 configured queue added to scheduler {"queueName": "root.parent.child2"}
2023-06-07T12:40:35.036+1000 INFO objects/application.go:184 Unknown gang scheduling style, using soft style as default {"gang scheduling style": ""}
2023-06-07T12:40:35.036+1000 DEBUG objects/queue.go:1780 Unknown application {"applicationID": "app-0"}
2023-06-07T12:40:35.037+1000 INFO objects/application_state.go:133 Application state transition {"appID": "app-0", "source": "New", "destination": "Accepted", "event": "runApplication"}
2023-06-07T12:40:35.037+1000 INFO objects/application.go:668 ask added successfully to application {"appID": "app-0", "ask": "alloc0-0", "placeholder": false, "pendingDelta": "map[first:11]"}
2023-06-07T12:40:35.037+1000 INFO objects/application.go:668 ask added successfully to application {"appID": "app-0", "ask": "alloc0-1", "placeholder": false, "pendingDelta": "map[first:11]"}
2023-06-07T12:40:35.037+1000 INFO objects/application.go:184 Unknown gang scheduling style, using soft style as default {"gang scheduling style": ""}
2023-06-07T12:40:35.037+1000 DEBUG objects/queue.go:1780 Unknown application {"applicationID": "app-1"}
2023-06-07T12:40:35.037+1000 INFO objects/application_state.go:133 Application state transition {"appID": "app-1", "source": "New", "destination": "Accepted", "event": "runApplication"}
2023-06-07T12:40:35.037+1000 INFO objects/application.go:668 ask added successfully to application {"appID": "app-1", "ask": "alloc1", "placeholder": false, "pendingDelta": "map[first:5]"}
2023-06-07T12:40:35.037+1000 INFO objects/application.go:668 ask added successfully to application {"appID": "app-1", "ask": "alloc2", "placeholder": false, "pendingDelta": "map[first:5]"}
2023-06-07T12:40:35.038+1000 INFO objects/application.go:184 Unknown gang scheduling style, using soft style as default {"gang scheduling style": ""}
2023-06-07T12:40:35.038+1000 DEBUG objects/queue.go:1780 Unknown application {"applicationID": "app-2"}
2023-06-07T12:40:35.038+1000 INFO objects/application_state.go:133 Application state transition {"appID": "app-2", "source": "New", "destination": "Accepted", "event": "runApplication"}
2023-06-07T12:40:35.038+1000 INFO objects/application.go:668 ask added successfully to application {"appID": "app-2", "ask": "alloc3", "placeholder": false, "pendingDelta": "map[first:5]"}
2023-06-07T12:40:35.038+1000 DEBUG objects/application.go:339 Application state timer initiated {"appID": "app-0", "state": "Starting", "timeout": "5m0s"}
2023-06-07T12:40:35.038+1000 INFO objects/application_state.go:133 Application state transition {"appID": "app-0", "source": "Accepted", "destination": "Starting", "event": "runApplication"}
2023-06-07T12:40:35.038+1000 DEBUG ugm/manager.go:63 Increasing resource usage {"user": "testuser", "queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.038+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.038+1000 DEBUG ugm/queue_tracker.go:45 Creating queue tracker object for queue {"queue": "unlimited"}
2023-06-07T12:40:35.038+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.038+1000 DEBUG ugm/manager.go:265 Group tracker already exists and linking (reusing) the same with application {"application": "app-0", "queue path": "root.unlimited", "user": "testuser", "group": "testgroup"}
2023-06-07T12:40:35.039+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.039+1000 DEBUG ugm/queue_tracker.go:45 Creating queue tracker object for queue {"queue": "unlimited"}
2023-06-07T12:40:35.039+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.039+1000 DEBUG objects/application.go:388 Application state timer cleared {"appID": "app-0", "state": "Starting"}
2023-06-07T12:40:35.039+1000 INFO objects/application_state.go:133 Application state transition {"appID": "app-0", "source": "Starting", "destination": "Running", "event": "runApplication"}
2023-06-07T12:40:35.039+1000 DEBUG ugm/manager.go:63 Increasing resource usage {"user": "testuser", "queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.039+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.039+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.039+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.039+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "unlimited", "application": "app-0", "resource": "map[first:11]"}
2023-06-07T12:40:35.040+1000 DEBUG objects/application.go:339 Application state timer initiated {"appID": "app-1", "state": "Starting", "timeout": "5m0s"}
2023-06-07T12:40:35.040+1000 INFO objects/application_state.go:133 Application state transition {"appID": "app-1", "source": "Accepted", "destination": "Starting", "event": "runApplication"}
2023-06-07T12:40:35.040+1000 DEBUG ugm/manager.go:63 Increasing resource usage {"user": "testuser", "queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.040+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.040+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.040+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.040+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.040+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.040+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.041+1000 DEBUG objects/application.go:388 Application state timer cleared {"appID": "app-1", "state": "Starting"}
2023-06-07T12:40:35.041+1000 INFO objects/application_state.go:133 Application state transition {"appID": "app-1", "source": "Starting", "destination": "Running", "event": "runApplication"}
2023-06-07T12:40:35.041+1000 DEBUG ugm/manager.go:63 Increasing resource usage {"user": "testuser", "queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.041+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.041+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.041+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.041+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.041+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "parent.child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.041+1000 DEBUG ugm/queue_tracker.go:57 Increasing resource usage {"queue path": "child1", "application": "app-1", "resource": "map[first:5]"}
2023-06-07T12:40:35.041+1000 DEBUG objects/application.go:1454 app reservation check {"allocationKey": "alloc3", "createTime": "2023-06-07T12:40:25.038+1000", "askAge": "10.003814459s", "reservationDelay": "2s"}
2023-06-07T12:40:35.042+1000 DEBUG objects/application.go:1454 app reservation check {"allocationKey": "alloc3", "createTime": "2023-06-07T12:40:25.038+1000", "askAge": "10.003914s", "reservationDelay": "2s"}
2023-06-07T12:40:35.042+1000 DEBUG objects/application.go:1470 found candidate node for app reservation {"appID": "app-2", "nodeID": "node1", "allocationKey": "alloc3", "reservations": 0, "pendingRepeats": 1}
2023-06-07T12:40:35.042+1000 DEBUG objects/application.go:1454 app reservation check {"allocationKey": "alloc3", "createTime": "2023-06-07T12:39:55.038+1000", "askAge": "40.004148584s", "reservationDelay": "2s"}
2023-06-07T12:40:35.042+1000 DEBUG objects/application.go:1454 app reservation check {"allocationKey": "alloc3", "createTime": "2023-06-07T12:39:55.038+1000", "askAge": "40.004227875s", "reservationDelay": "2s"}
2023-06-07T12:40:35.042+1000 DEBUG objects/preemption.go:352 No RM callback plugin registered, using first selected node for preemption {"NodeID": "node2", "AllocationKey": "alloc3"}
2023-06-07T12:40:35.043+1000 ERROR objects/preemption.go:517 BUG: Didn't find instance type in the nodeInstanceTypeMap {"nodeId": "node2"}
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Preemptor).tryNodes
/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/preemption.go:517
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Preemptor).TryPreemption
/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/preemption.go:537
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryPreemption
/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/application.go:1327
github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryAllocate
/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/application.go:1050
github.com/apache/yunikorn-core/pkg/scheduler/objects.TestTryAllocatePreemptNode
/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/application_test.go:1874
testing.tRunner /usr/local/go/src/testing/testing.go:1576
2023-06-07T12:40:35.043+1000 INFO objects/preemption.go:559 Preempting task {"applicationID": "app-1", "allocationKey": "alloc2", "nodeID": "node2", "resources": "map[first:5]"}
2023-06-07T12:40:35.043+1000 INFO objects/preemption.go:580 Reserving node for ask after preemption {"allocationKey": "alloc3", "nodeID": "node2", "victimCount": 1}
--- FAIL: TestTryAllocatePreemptNode (0.06s)
application_test.go:1876: assertion failed: expression is false: alloc1.IsPreempted(): alloc1 should have been preempted
```
### What type of PR is it?
* [ ] - Bug Fix
* [ ] - Improvement
* [ ] - Feature
* [ ] - Documentation
* [ ] - Hot Fix
* [ ] - Refactoring
### Todos
* [ ] - Task
### What is the Jira issue?
* Open an issue on Jira https://issues.apache.org/jira/browse/YUNIKORN/
* Put link here, and add [YUNIKORN-*Jira number*] in PR title, eg. `[YUNIKORN-2] Gang scheduling interface parameters`
### How should this be tested?
### Screenshots (if appropriate)
### Questions:
* [ ] - The licenses files need update.
* [ ] - There is breaking changes for older versions.
* [ ] - It needs documentation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] pbacsko closed pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko closed pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
URL: https://github.com/apache/yunikorn-core/pull/561
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588500613
Good explain @FrankYang0529, and for test case, for test case we have no RM callback, so it will call the predicateChecks[0], if we need to handle this special for test case, or real case will meet this also for no RM callback?
cc @craigcondit
```
// check for RM callback
plugin := plugins.GetResourceManagerCallbackPlugin()
if plugin == nil {
// if a plugin isn't registered, assume checks will succeed and synthesize a result
check := predicateChecks[0]
log.Logger().Debug("No RM callback plugin registered, using first selected node for preemption",
zap.String("NodeID", check.NodeID),
zap.String("AllocationKey", check.AllocationKey))
result := &predicateCheckResult{
allocationKey: check.AllocationKey,
nodeID: check.NodeID,
success: true,
index: int(check.StartIndex),
}
result.populateVictims(victimsByNode)
return result
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] codecov[bot] commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580490419
## [Codecov](https://app.codecov.io/gh/apache/yunikorn-core/pull/561?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
> Merging [#561](https://app.codecov.io/gh/apache/yunikorn-core/pull/561?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (143cb65) into [master](https://app.codecov.io/gh/apache/yunikorn-core/commit/993f07f0629cba5ff64c9c6d346f2ccff117ca1d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (993f07f) will **not change** coverage.
> The diff coverage is `n/a`.
```diff
@@ Coverage Diff @@
## master #561 +/- ##
=======================================
Coverage 74.84% 74.84%
=======================================
Files 71 71
Lines 11537 11537
=======================================
Hits 8635 8635
Misses 2597 2597
Partials 305 305
```
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] FrankYang0529 commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "FrankYang0529 (via GitHub)" <gi...@apache.org>.
FrankYang0529 commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1587587529
I tried to run `TestTryAllocatePreemptNode` multiple time and compared logs between expected and unexpected results. I found the different part is following:
```
# expected
2023-06-12T16:49:19.170+0800 DEBUG objects/preemption.go:352 No RM callback plugin registered, using first selected node for preemption {"NodeID": "node1", "AllocationKey": "alloc3"}
# unexpected
2023-06-12T22:58:18.571+0800 DEBUG objects/preemption.go:352 No RM callback plugin registered, using first selected node for preemption {"NodeID": "node2", "AllocationKey": "alloc3"}
```
Above logs are from the following code, so I checked how we get `predicateChecks`.
https://github.com/apache/yunikorn-core/blob/d2d7ce89457cbbce140e2737098ae62f6a9d8fee/pkg/scheduler/objects/preemption.go#L351-L354
We get `predicateChecks` by for-loop `p.nodeAvailableMap` and find fit nodes as following:
https://github.com/apache/yunikorn-core/blob/d2d7ce89457cbbce140e2737098ae62f6a9d8fee/pkg/scheduler/objects/preemption.go#L485-L512
The problem is that Golang doesn't guarantee key order in map (we can run [this](https://go.dev/play/p/IEYV9Vy22wx) to check it). If we get `node2` first and it has same `StartIndex` with `node1`, we will preempt `node2`.
https://github.com/apache/yunikorn-core/blob/d2d7ce89457cbbce140e2737098ae62f6a9d8fee/pkg/scheduler/objects/preemption.go#L342-L345
My suggestion is that we can sort by `NodeID` if `StartIndex` are same.
cc @craigcondit @pbacsko @zhuqi-lucas.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580930038
> > Thanks @pbacsko and @craigcondit for review, if there are some easy ways for us to find the root cause about these flaky test?
>
> I'm not sure, as I mentioned, it probably has to do with the order of the tests. AFAIK Go test executor randomizes the test order, precisely to eliminate these kind of state issues. Examine the order of tests (add extra printouts, whatever that works) in both the good and bad scenarios and think about which one is the offending test case.
Thanks for this clarify!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588781517
Looks good, I found two other flaky tests "TestPlaceholderSmallerThanReal" and "TestPlaceholderSmallerMulti" but I don't think that this change causes those, it's most likely the very low timeout.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588029769
Thanks @FrankYang0529 for the findings, I suspected a different root cause, but this explains everything.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580915736
> Thanks @pbacsko and @craigcondit for review, if there are some easy ways for us to find the root cause about these flaky test?
I'm not sure, as I mentioned, it probably has to do with the order of the test. AFAIK Go test executor randomizes the test order, precisely to eliminate these kind of state issues. Examine the order of tests (add extra printouts, whatever that works) in both the good and bad scenarios and think about which one is the offending test case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580908470
Thanks @pbacsko and @craigcondit for review, if there are some easy ways for us to find the root cause about these flaky test?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] craigcondit commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580870891
Yes, this fails intermittently. I don't think it's a logic error, but probably something non-deterministic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] zhuqi-lucas commented on a diff in pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on code in PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#discussion_r1221361417
##########
pkg/scheduler/objects/application_test.go:
##########
@@ -1815,18 +1815,18 @@ func TestTryAllocatePreemptNode(t *testing.T) {
// consume capacity with 'unlimited' app
alloc00 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 40}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
assert.Assert(t, alloc00 != nil, "alloc00 expected")
- alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 39}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
+ alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 29}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
Review Comment:
The first allocation ask is 11, so we need to 40 - 11 = 29.
##########
pkg/scheduler/objects/application_test.go:
##########
@@ -1815,18 +1815,18 @@ func TestTryAllocatePreemptNode(t *testing.T) {
// consume capacity with 'unlimited' app
alloc00 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 40}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
assert.Assert(t, alloc00 != nil, "alloc00 expected")
- alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 39}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
+ alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 29}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
Review Comment:
The first allocation ask is 11, so we need to 40 - 11 = 29
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588716975
Address your comments @FrankYang0529 thanks! cc @pbacsko
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588712113
Sure @pbacsko , i will submit a new PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580875330
It has to do with test ordering, a test does something which cascades to the next one, etc. There is a shared state somewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on code in PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#discussion_r1227669883
##########
pkg/scheduler/objects/application_test.go:
##########
@@ -1815,18 +1815,18 @@ func TestTryAllocatePreemptNode(t *testing.T) {
// consume capacity with 'unlimited' app
alloc00 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 40}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
assert.Assert(t, alloc00 != nil, "alloc00 expected")
- alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 39}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
+ alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 29}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
Review Comment:
Remove changes in this file, they're not needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588711150
@zhuqi-lucas can you make the changes based on @FrankYang0529 's comment? I can see this test fail pretty often, we should really fix it quickly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588725470
> Restore `application_test.go`.
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org