You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by "zhuqi-lucas (via GitHub)" <gi...@apache.org> on 2023/06/07 10:37:08 UTC

[GitHub] [yunikorn-core] zhuqi-lucas opened a new pull request, #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

zhuqi-lucas opened a new pull request, #561:
URL: https://github.com/apache/yunikorn-core/pull/561

   ### What is this PR for?
   The TestTryAllocatePreemptNode fails sometimes with a different allocation being preempted than the expected alloc1:
   
   ```
   2023-06-07T12:40:35.035+1000	INFO	objects/queue.go:139	configured queue added to scheduler	{"queueName": "root"}
   2023-06-07T12:40:35.035+1000	INFO	objects/queue.go:1190	updating root queue max resources	{"current max": "nil resource", "new max": "map[first:40]"}
   2023-06-07T12:40:35.035+1000	DEBUG	objects/queue.go:850	new parent queue inheriting template from parent queue	{"child queue": "root.parent", "parent queue": "root"}
   2023-06-07T12:40:35.035+1000	INFO	objects/queue.go:139	configured queue added to scheduler	{"queueName": "root.parent"}
   2023-06-07T12:40:35.036+1000	DEBUG	objects/queue.go:343	max resources setting ignored: cannot set zero max resources
   2023-06-07T12:40:35.036+1000	DEBUG	objects/queue.go:350	guaranteed resources setting ignored: cannot set zero max resources
   2023-06-07T12:40:35.036+1000	INFO	objects/queue.go:139	configured queue added to scheduler	{"queueName": "root.unlimited"}
   2023-06-07T12:40:35.036+1000	DEBUG	objects/queue.go:343	max resources setting ignored: cannot set zero max resources2023-06-07T12:40:35.036+1000	INFO	objects/queue.go:139	configured queue added to scheduler	{"queueName": "root.parent.child1"}
   2023-06-07T12:40:35.036+1000	DEBUG	objects/queue.go:343	max resources setting ignored: cannot set zero max resources
   2023-06-07T12:40:35.036+1000	INFO	objects/queue.go:139	configured queue added to scheduler	{"queueName": "root.parent.child2"}
   2023-06-07T12:40:35.036+1000	INFO	objects/application.go:184	Unknown gang scheduling style, using soft style as default	{"gang scheduling style": ""}
   2023-06-07T12:40:35.036+1000	DEBUG	objects/queue.go:1780	Unknown application	{"applicationID": "app-0"}
   2023-06-07T12:40:35.037+1000	INFO	objects/application_state.go:133	Application state transition	{"appID": "app-0", "source": "New", "destination": "Accepted", "event": "runApplication"}
   2023-06-07T12:40:35.037+1000	INFO	objects/application.go:668	ask added successfully to application	{"appID": "app-0", "ask": "alloc0-0", "placeholder": false, "pendingDelta": "map[first:11]"}
   2023-06-07T12:40:35.037+1000	INFO	objects/application.go:668	ask added successfully to application	{"appID": "app-0", "ask": "alloc0-1", "placeholder": false, "pendingDelta": "map[first:11]"}
   2023-06-07T12:40:35.037+1000	INFO	objects/application.go:184	Unknown gang scheduling style, using soft style as default	{"gang scheduling style": ""}
   2023-06-07T12:40:35.037+1000	DEBUG	objects/queue.go:1780	Unknown application	{"applicationID": "app-1"}
   2023-06-07T12:40:35.037+1000	INFO	objects/application_state.go:133	Application state transition	{"appID": "app-1", "source": "New", "destination": "Accepted", "event": "runApplication"}
   2023-06-07T12:40:35.037+1000	INFO	objects/application.go:668	ask added successfully to application	{"appID": "app-1", "ask": "alloc1", "placeholder": false, "pendingDelta": "map[first:5]"}
   2023-06-07T12:40:35.037+1000	INFO	objects/application.go:668	ask added successfully to application	{"appID": "app-1", "ask": "alloc2", "placeholder": false, "pendingDelta": "map[first:5]"}
   2023-06-07T12:40:35.038+1000	INFO	objects/application.go:184	Unknown gang scheduling style, using soft style as default	{"gang scheduling style": ""}
   2023-06-07T12:40:35.038+1000	DEBUG	objects/queue.go:1780	Unknown application	{"applicationID": "app-2"}
   2023-06-07T12:40:35.038+1000	INFO	objects/application_state.go:133	Application state transition	{"appID": "app-2", "source": "New", "destination": "Accepted", "event": "runApplication"}
   2023-06-07T12:40:35.038+1000	INFO	objects/application.go:668	ask added successfully to application	{"appID": "app-2", "ask": "alloc3", "placeholder": false, "pendingDelta": "map[first:5]"}
   2023-06-07T12:40:35.038+1000	DEBUG	objects/application.go:339	Application state timer initiated	{"appID": "app-0", "state": "Starting", "timeout": "5m0s"}
   2023-06-07T12:40:35.038+1000	INFO	objects/application_state.go:133	Application state transition	{"appID": "app-0", "source": "Accepted", "destination": "Starting", "event": "runApplication"}
   2023-06-07T12:40:35.038+1000	DEBUG	ugm/manager.go:63	Increasing resource usage	{"user": "testuser", "queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.038+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.038+1000	DEBUG	ugm/queue_tracker.go:45	Creating queue tracker object for queue	{"queue": "unlimited"}
   2023-06-07T12:40:35.038+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.038+1000	DEBUG	ugm/manager.go:265	Group tracker already exists and linking (reusing) the same with application	{"application": "app-0", "queue path": "root.unlimited", "user": "testuser", "group": "testgroup"}
   2023-06-07T12:40:35.039+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.039+1000	DEBUG	ugm/queue_tracker.go:45	Creating queue tracker object for queue	{"queue": "unlimited"}
   2023-06-07T12:40:35.039+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.039+1000	DEBUG	objects/application.go:388	Application state timer cleared	{"appID": "app-0", "state": "Starting"}
   2023-06-07T12:40:35.039+1000	INFO	objects/application_state.go:133	Application state transition	{"appID": "app-0", "source": "Starting", "destination": "Running", "event": "runApplication"}
   2023-06-07T12:40:35.039+1000	DEBUG	ugm/manager.go:63	Increasing resource usage	{"user": "testuser", "queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.039+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.039+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.039+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "root.unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.039+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "unlimited", "application": "app-0", "resource": "map[first:11]"}
   2023-06-07T12:40:35.040+1000	DEBUG	objects/application.go:339	Application state timer initiated	{"appID": "app-1", "state": "Starting", "timeout": "5m0s"}
   2023-06-07T12:40:35.040+1000	INFO	objects/application_state.go:133	Application state transition	{"appID": "app-1", "source": "Accepted", "destination": "Starting", "event": "runApplication"}
   2023-06-07T12:40:35.040+1000	DEBUG	ugm/manager.go:63	Increasing resource usage	{"user": "testuser", "queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.040+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.040+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.040+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.040+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.040+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.040+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.041+1000	DEBUG	objects/application.go:388	Application state timer cleared	{"appID": "app-1", "state": "Starting"}
   2023-06-07T12:40:35.041+1000	INFO	objects/application_state.go:133	Application state transition	{"appID": "app-1", "source": "Starting", "destination": "Running", "event": "runApplication"}
   2023-06-07T12:40:35.041+1000	DEBUG	ugm/manager.go:63	Increasing resource usage	{"user": "testuser", "queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.041+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.041+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.041+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.041+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "root.parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.041+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "parent.child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.041+1000	DEBUG	ugm/queue_tracker.go:57	Increasing resource usage	{"queue path": "child1", "application": "app-1", "resource": "map[first:5]"}
   2023-06-07T12:40:35.041+1000	DEBUG	objects/application.go:1454	app reservation check	{"allocationKey": "alloc3", "createTime": "2023-06-07T12:40:25.038+1000", "askAge": "10.003814459s", "reservationDelay": "2s"}
   2023-06-07T12:40:35.042+1000	DEBUG	objects/application.go:1454	app reservation check	{"allocationKey": "alloc3", "createTime": "2023-06-07T12:40:25.038+1000", "askAge": "10.003914s", "reservationDelay": "2s"}
   2023-06-07T12:40:35.042+1000	DEBUG	objects/application.go:1470	found candidate node for app reservation	{"appID": "app-2", "nodeID": "node1", "allocationKey": "alloc3", "reservations": 0, "pendingRepeats": 1}
   2023-06-07T12:40:35.042+1000	DEBUG	objects/application.go:1454	app reservation check	{"allocationKey": "alloc3", "createTime": "2023-06-07T12:39:55.038+1000", "askAge": "40.004148584s", "reservationDelay": "2s"}
   2023-06-07T12:40:35.042+1000	DEBUG	objects/application.go:1454	app reservation check	{"allocationKey": "alloc3", "createTime": "2023-06-07T12:39:55.038+1000", "askAge": "40.004227875s", "reservationDelay": "2s"}
   2023-06-07T12:40:35.042+1000	DEBUG	objects/preemption.go:352	No RM callback plugin registered, using first selected node for preemption	{"NodeID": "node2", "AllocationKey": "alloc3"}
   2023-06-07T12:40:35.043+1000	ERROR	objects/preemption.go:517	BUG: Didn't find instance type in the nodeInstanceTypeMap	{"nodeId": "node2"}
   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Preemptor).tryNodes
   	/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/preemption.go:517
   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Preemptor).TryPreemption
   	/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/preemption.go:537
   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryPreemption
   	/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/application.go:1327
   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryAllocate
   	/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/application.go:1050
   github.com/apache/yunikorn-core/pkg/scheduler/objects.TestTryAllocatePreemptNode
   	/Users/wilfred/Downloads/yunikorn-1.3.0/apache-yunikorn-1.3.0-src/core/pkg/scheduler/objects/application_test.go:1874
   testing.tRunner	/usr/local/go/src/testing/testing.go:1576
   2023-06-07T12:40:35.043+1000	INFO	objects/preemption.go:559	Preempting task	{"applicationID": "app-1", "allocationKey": "alloc2", "nodeID": "node2", "resources": "map[first:5]"}
   2023-06-07T12:40:35.043+1000	INFO	objects/preemption.go:580	Reserving node for ask after preemption	{"allocationKey": "alloc3", "nodeID": "node2", "victimCount": 1}
   --- FAIL: TestTryAllocatePreemptNode (0.06s)
       application_test.go:1876: assertion failed: expression is false: alloc1.IsPreempted(): alloc1 should have been preempted 
   ```
   
   
   ### What type of PR is it?
   * [ ] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   * Open an issue on Jira https://issues.apache.org/jira/browse/YUNIKORN/
   * Put link here, and add [YUNIKORN-*Jira number*] in PR title, eg. `[YUNIKORN-2] Gang scheduling interface parameters`
   
   ### How should this be tested?
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko closed pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko closed pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test
URL: https://github.com/apache/yunikorn-core/pull/561


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588500613

   Good explain @FrankYang0529, and for test case, for test case we have no RM callback, so it will call the predicateChecks[0], if we need to handle this special for test case, or real case will meet this also for no RM callback?
   cc @craigcondit 
   ```
   // check for RM callback
   	plugin := plugins.GetResourceManagerCallbackPlugin()
   	if plugin == nil {
   		// if a plugin isn't registered, assume checks will succeed and synthesize a result
   		check := predicateChecks[0]
   		log.Logger().Debug("No RM callback plugin registered, using first selected node for preemption",
   			zap.String("NodeID", check.NodeID),
   			zap.String("AllocationKey", check.AllocationKey))
   
   		result := &predicateCheckResult{
   			allocationKey: check.AllocationKey,
   			nodeID:        check.NodeID,
   			success:       true,
   			index:         int(check.StartIndex),
   		}
   		result.populateVictims(victimsByNode)
   		return result
   	}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] codecov[bot] commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580490419

   ## [Codecov](https://app.codecov.io/gh/apache/yunikorn-core/pull/561?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#561](https://app.codecov.io/gh/apache/yunikorn-core/pull/561?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (143cb65) into [master](https://app.codecov.io/gh/apache/yunikorn-core/commit/993f07f0629cba5ff64c9c6d346f2ccff117ca1d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (993f07f) will **not change** coverage.
   > The diff coverage is `n/a`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #561   +/-   ##
   =======================================
     Coverage   74.84%   74.84%           
   =======================================
     Files          71       71           
     Lines       11537    11537           
   =======================================
     Hits         8635     8635           
     Misses       2597     2597           
     Partials      305      305           
   ```
   
   
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] FrankYang0529 commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "FrankYang0529 (via GitHub)" <gi...@apache.org>.
FrankYang0529 commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1587587529

   I tried to run `TestTryAllocatePreemptNode` multiple time and compared logs between expected and unexpected results. I found the different part is following:
   ```
   # expected
   2023-06-12T16:49:19.170+0800    DEBUG   objects/preemption.go:352       No RM callback plugin registered, using first selected node for preemption      {"NodeID": "node1", "AllocationKey": "alloc3"}
   # unexpected
   2023-06-12T22:58:18.571+0800    DEBUG   objects/preemption.go:352       No RM callback plugin registered, using first selected node for preemption      {"NodeID": "node2", "AllocationKey": "alloc3"}
   ```
   Above logs are from the following code, so I checked how we get `predicateChecks`.
   https://github.com/apache/yunikorn-core/blob/d2d7ce89457cbbce140e2737098ae62f6a9d8fee/pkg/scheduler/objects/preemption.go#L351-L354
   
   We get `predicateChecks` by for-loop `p.nodeAvailableMap` and find fit nodes as following:
   https://github.com/apache/yunikorn-core/blob/d2d7ce89457cbbce140e2737098ae62f6a9d8fee/pkg/scheduler/objects/preemption.go#L485-L512
   
   The problem is that Golang doesn't guarantee key order in map (we can run [this](https://go.dev/play/p/IEYV9Vy22wx) to check it). If we get `node2` first and it has same `StartIndex` with `node1`, we will preempt `node2`.
   https://github.com/apache/yunikorn-core/blob/d2d7ce89457cbbce140e2737098ae62f6a9d8fee/pkg/scheduler/objects/preemption.go#L342-L345
   
   My suggestion is that we can sort by `NodeID` if `StartIndex` are same.
   
   cc @craigcondit @pbacsko @zhuqi-lucas.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580930038

   > > Thanks @pbacsko and @craigcondit for review, if there are some easy ways for us to find the root cause about these flaky test?
   > 
   > I'm not sure, as I mentioned, it probably has to do with the order of the tests. AFAIK Go test executor randomizes the test order, precisely to eliminate these kind of state issues. Examine the order of tests (add extra printouts, whatever that works) in both the good and bad scenarios and think about which one is the offending test case.
   
   Thanks for this clarify!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588781517

   Looks good, I found two other flaky tests "TestPlaceholderSmallerThanReal" and "TestPlaceholderSmallerMulti" but I don't think that this change causes those, it's most likely the very low timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588029769

   Thanks @FrankYang0529 for the findings, I suspected a different root cause, but this explains everything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580915736

   > Thanks @pbacsko and @craigcondit for review, if there are some easy ways for us to find the root cause about these flaky test?
   
   I'm not sure, as I mentioned, it probably has to do with the order of the test. AFAIK Go test executor randomizes the test order, precisely to eliminate these kind of state issues. Examine the order of tests (add extra printouts, whatever that works) in both the good and bad scenarios and think about which one is the offending test case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580908470

   Thanks @pbacsko and @craigcondit for review, if there are some easy ways for us to find the root cause about these flaky test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] craigcondit commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580870891

   Yes, this fails intermittently. I don't think it's a logic error, but probably something non-deterministic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] zhuqi-lucas commented on a diff in pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on code in PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#discussion_r1221361417


##########
pkg/scheduler/objects/application_test.go:
##########
@@ -1815,18 +1815,18 @@ func TestTryAllocatePreemptNode(t *testing.T) {
 	// consume capacity with 'unlimited' app
 	alloc00 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 40}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
 	assert.Assert(t, alloc00 != nil, "alloc00 expected")
-	alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 39}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
+	alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 29}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)

Review Comment:
   The first allocation ask is 11, so we need to 40 - 11 = 29.



##########
pkg/scheduler/objects/application_test.go:
##########
@@ -1815,18 +1815,18 @@ func TestTryAllocatePreemptNode(t *testing.T) {
 	// consume capacity with 'unlimited' app
 	alloc00 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 40}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
 	assert.Assert(t, alloc00 != nil, "alloc00 expected")
-	alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 39}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
+	alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 29}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)

Review Comment:
   The first allocation ask is 11, so we need to 40 - 11 = 29



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588716975

   Address your comments @FrankYang0529  thanks! cc @pbacsko 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588712113

   Sure @pbacsko , i will submit a new PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1580875330

   It has to do with test ordering, a test does something which cascades to the next one, etc. There is a shared state somewhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on a diff in pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on code in PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#discussion_r1227669883


##########
pkg/scheduler/objects/application_test.go:
##########
@@ -1815,18 +1815,18 @@ func TestTryAllocatePreemptNode(t *testing.T) {
 	// consume capacity with 'unlimited' app
 	alloc00 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 40}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
 	assert.Assert(t, alloc00 != nil, "alloc00 expected")
-	alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 39}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)
+	alloc01 := app0.tryAllocate(resources.NewResourceFromMap(map[string]resources.Quantity{"first": 29}), 30*time.Second, &preemptionAttemptsRemaining, iterator, iterator, getNode)

Review Comment:
   Remove changes in this file, they're not needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] pbacsko commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "pbacsko (via GitHub)" <gi...@apache.org>.
pbacsko commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588711150

   @zhuqi-lucas can you make the changes based on @FrankYang0529 's comment? I can see this test fail pretty often, we should really fix it quickly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-core] zhuqi-lucas commented on pull request #561: [YUNIKORN-1790] TestTryAllocatePreemptNode: flaky test

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #561:
URL: https://github.com/apache/yunikorn-core/pull/561#issuecomment-1588725470

   > Restore `application_test.go`.
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org