You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2020/09/11 07:10:27 UTC

[GitHub] [incubator-yunikorn-k8shim] TaoYang526 opened a new pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

TaoYang526 opened a new pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190


   Details please refer to [YUNIKORN-410](https://issues.apache.org/jira/browse/YUNIKORN-410)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on a change in pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei commented on a change in pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#discussion_r487263705



##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-691287098






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] edited a comment on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-690934990






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] edited a comment on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-690934990


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=h1) Report
   > Merging [#190](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/commit/d093b0d550e4aff685fdd42ed166db0ec77d27dd?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190/graphs/tree.svg?width=650&height=150&src=pr&token=LZImIuvleR)](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #190      +/-   ##
   ==========================================
   - Coverage   58.94%   58.94%   -0.01%     
   ==========================================
     Files          33       33              
     Lines        3303     3305       +2     
   ==========================================
   + Hits         1947     1948       +1     
   - Misses       1285     1286       +1     
     Partials       71       71              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [pkg/cache/node\_coordinator.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190/diff?src=pr&el=tree#diff-cGtnL2NhY2hlL25vZGVfY29vcmRpbmF0b3IuZ28=) | `65.62% <100.00%> (+1.10%)` | :arrow_up: |
   | [pkg/shim/scheduler\_mock.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190/diff?src=pr&el=tree#diff-cGtnL3NoaW0vc2NoZWR1bGVyX21vY2suZ28=) | `85.00% <0.00%> (-0.84%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=footer). Last update [d093b0d...cd65189](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] commented on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-690934990


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=h1) Report
   > Merging [#190](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/commit/d093b0d550e4aff685fdd42ed166db0ec77d27dd?el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190/graphs/tree.svg?width=650&height=150&src=pr&token=LZImIuvleR)](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #190      +/-   ##
   ==========================================
   - Coverage   58.94%   58.94%   -0.01%     
   ==========================================
     Files          33       33              
     Lines        3303     3305       +2     
   ==========================================
   + Hits         1947     1948       +1     
   - Misses       1285     1286       +1     
     Partials       71       71              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [pkg/cache/node\_coordinator.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190/diff?src=pr&el=tree#diff-cGtnL2NhY2hlL25vZGVfY29vcmRpbmF0b3IuZ28=) | `65.62% <100.00%> (+1.10%)` | :arrow_up: |
   | [pkg/shim/scheduler\_mock.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190/diff?src=pr&el=tree#diff-cGtnL3NoaW0vc2NoZWR1bGVyX21vY2suZ28=) | `85.00% <0.00%> (-0.84%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=footer). Last update [d093b0d...cd65189](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/190?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] commented on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-690934990






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on a change in pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei commented on a change in pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#discussion_r487263705



##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] wilfred-s commented on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-694640099


   I echo the question from @yangwwei on the fact that a pod status should not be going back to pending.
   
   Based on the response that is part of an old [kubernetes issue](https://github.com/kubernetes/kubernetes/issues/10319), and the [pod documentation](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/) a pod should never go back to a pending state.
   
   The status is a one way change from pending -> running -> (succeeded || failed)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-691287098


   hi @TaoYang526 
   
   I am trying to understand how this could happen. In the code, we already did a check
   
   ```
   oldPod.Status.Phase != newPod.Status.Phase
   ```
   
   that means when we try to update the occupied resource, that means the pod is transiting from `Pending` to `Running`. 
   
   Pod phases are: `Pending`, `Running`, `Succeeded`, `Failed`, `Unknown`. Among them, `Succeed` and `Failed` are terminated state, which means they cannot transit to `Running` at all. So the only possibility is from `Pending` to `Running`, or `Unknown` to `Running`. The case you are seeing, is it from `Unknown` to `Running`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] commented on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-690934990






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on a change in pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei commented on a change in pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#discussion_r487263705



##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 

##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 

##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 

##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 

##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-691287098






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei closed pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei closed pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-691287098


   hi @TaoYang526 
   
   I am trying to understand how this could happen. In the code, we already did a check
   
   ```
   oldPod.Status.Phase != newPod.Status.Phase
   ```
   
   that means when we try to update the occupied resource, that means the pod is transiting from `Pending` to `Running`. 
   
   Pod phases are: `Pending`, `Running`, `Succeeded`, `Failed`, `Unknown`. Among them, `Succeed` and `Failed` are terminated state, which means they cannot transit to `Running` at all. So the only possibility is from `Pending` to `Running`, or `Unknown` to `Running`. The case you are seeing, is it from `Unknown` to `Running`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on a change in pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
yangwwei commented on a change in pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#discussion_r487263705



##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 

##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 

##########
File path: pkg/cache/node_coordinator_test.go
##########
@@ -248,3 +248,45 @@ func TestDeleteTerminatedPod(t *testing.T) {
 	coordinator.deletePod(pod2)
 	assert.Equal(t, executed, false)
 }
+
+func TestUpdatePodPendingRunningAlternatively(t *testing.T) {
+	mockedSchedulerAPI := newMockSchedulerAPI()
+	nodes := newSchedulerNodes(mockedSchedulerAPI, NewTestSchedulerCache())
+	host1 := utils.NodeForTest("HOST1", "10G", "10")
+	nodes.addNode(host1)
+	coordinator := newNodeResourceCoordinator(nodes)
+
+	// init pod (pod1) and changed pod (pod2)
+	pod1 := utils.PodForTest("pod1", "1G", "500m")
+	pod1.SetUID("UID-01")
+	pod2 := utils.PodForTest("pod1", "1G", "500m")
+	pod2.SetUID("UID-01")
+
+	// pod state changed from pending to running, trigger an update
+	pod1.Status.Phase = v1.PodPending
+	pod2.Status.Phase = v1.PodRunning
+	pod1.Spec.NodeName = "HOST1"
+	pod2.Spec.NodeName = "HOST1"
+	coordinator.updatePod(pod1, pod2)
+	node1 := coordinator.nodes.nodesMap["HOST1"]
+	assert.Equal(t, node1.occupied.Resources[constants.Memory].Value, int64(1000))
+	assert.Equal(t, node1.occupied.Resources[constants.CPU].Value, int64(500))
+
+	// pod state changed from running to pending,

Review comment:
       Can pod state change from `Running` to `Pending`? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] edited a comment on pull request #190: [YUNIKORN-410] Pod state change may cause incorrect update on SchedulerNode#occupied

Posted by GitBox <gi...@apache.org>.
codecov[bot] edited a comment on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/190#issuecomment-690934990






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org