You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2020/12/10 16:12:33 UTC

[GitHub] [incubator-yunikorn-core] manirajv06 opened a new pull request #229: YUNIKORN-466: Node capacity update does not update root queue or partition

manirajv06 opened a new pull request #229:
URL: https://github.com/apache/incubator-yunikorn-core/pull/229


   Added updateNode in Partition object to update the partition resource and root queue resource whenever node resources changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] manirajv06 commented on a change in pull request #229: YUNIKORN-466: Node capacity update does not update root queue or partition

Posted by GitBox <gi...@apache.org>.
manirajv06 commented on a change in pull request #229:
URL: https://github.com/apache/incubator-yunikorn-core/pull/229#discussion_r540979955



##########
File path: pkg/scheduler/partition.go
##########
@@ -586,6 +586,16 @@ func (pc *PartitionContext) removeNode(nodeID string) []*objects.Allocation {
 	return pc.removeNodeInternal(nodeID)
 }
 
+// Update a node

Review comment:
       > Can you also add unit test for the newly created method as part of the partition tests?
   
   Added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] wilfred-s closed pull request #229: YUNIKORN-466: Node capacity update does not update root queue or partition

Posted by GitBox <gi...@apache.org>.
wilfred-s closed pull request #229:
URL: https://github.com/apache/incubator-yunikorn-core/pull/229


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] codecov[bot] commented on pull request #229: YUNIKORN-466: Node capacity update does not update root queue or partition

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #229:
URL: https://github.com/apache/incubator-yunikorn-core/pull/229#issuecomment-762705626


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229?src=pr&el=h1) Report
   > Merging [#229](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229?src=pr&el=desc) (4d8e4d5) into [master](https://codecov.io/gh/apache/incubator-yunikorn-core/commit/01f5288eb5931a16834e0402af9efb308f946ad4?el=desc) (01f5288) will **increase** coverage by `0.00%`.
   > The diff coverage is `80.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/graphs/tree.svg?width=650&height=150&src=pr&token=SB9NrIi3Hy)](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff            @@
   ##           master     #229    +/-   ##
   ========================================
     Coverage   63.12%   63.13%            
   ========================================
     Files          59       60     +1     
     Lines        4879     5205   +326     
   ========================================
   + Hits         3080     3286   +206     
   - Misses       1657     1758   +101     
   - Partials      142      161    +19     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [pkg/scheduler/context.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree#diff-cGtnL3NjaGVkdWxlci9jb250ZXh0Lmdv) | `6.28% <0.00%> (+6.28%)` | :arrow_up: |
   | [pkg/scheduler/objects/node.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL25vZGUuZ28=) | `81.98% <66.66%> (-1.36%)` | :arrow_down: |
   | [pkg/scheduler/partition.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree#diff-cGtnL3NjaGVkdWxlci9wYXJ0aXRpb24uZ28=) | `63.09% <100.00%> (-1.82%)` | :arrow_down: |
   | [pkg/scheduler/objects/application.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL2FwcGxpY2F0aW9uLmdv) | `49.06% <0.00%> (-6.25%)` | :arrow_down: |
   | [pkg/metrics/scheduler.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree#diff-cGtnL21ldHJpY3Mvc2NoZWR1bGVyLmdv) | `61.58% <0.00%> (-3.69%)` | :arrow_down: |
   | [pkg/common/resources/resources.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree#diff-cGtnL2NvbW1vbi9yZXNvdXJjZXMvcmVzb3VyY2VzLmdv) | `96.02% <0.00%> (-1.50%)` | :arrow_down: |
   | [pkg/webservice/handlers.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree#diff-cGtnL3dlYnNlcnZpY2UvaGFuZGxlcnMuZ28=) | `54.13% <0.00%> (-0.95%)` | :arrow_down: |
   | [pkg/scheduler/objects/queue.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL3F1ZXVlLmdv) | `69.44% <0.00%> (-0.54%)` | :arrow_down: |
   | ... and [7 more](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229?src=pr&el=footer). Last update [01f5288...4d8e4d5](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/229?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] wilfred-s commented on a change in pull request #229: YUNIKORN-466: Node capacity update does not update root queue or partition

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on a change in pull request #229:
URL: https://github.com/apache/incubator-yunikorn-core/pull/229#discussion_r551097097



##########
File path: pkg/scheduler/partition_test.go
##########
@@ -1079,3 +1079,55 @@ func TestUpdateRootQueue(t *testing.T) {
 	assert.Equal(t, partition.GetQueue("root.leaf").CurrentState(), objects.Draining.String(), "leaf queue should have been marked for removal")
 	assert.Equal(t, partition.GetQueue("root.parent").CurrentState(), objects.Draining.String(), "parent queue should have been marked for removal")
 }
+
+func TestUpdateNode(t *testing.T) {
+	partition, err := newBasePartition()
+	assert.NilError(t, err, "test partition create failed with error")
+
+	newRes, err := resources.NewResourceFromConf(map[string]string{"memory": "400", "vcore": "30"})
+	assert.NilError(t, err, "failed to create resource")
+
+	err = partition.AddNode(newNodeMaxResource("test", newRes), nil)
+	assert.NilError(t, err, "test node add failed unexpected")
+	assert.Equal(t, 1, len(partition.nodes), "node list not correct")
+
+	if !resources.Equals(newRes, partition.GetTotalPartitionResource()) {
+		t.Errorf("Expected partition resource %s, doesn't match with actual partition resource %s", newRes, partition.GetTotalPartitionResource())
+	}
+
+	//delta resource for a node with mem as 450 and vcores as 40 (both mem and vcores has increeased)

Review comment:
       This causes the lint check to fail: comments must have a space between the `//` and the text.

##########
File path: pkg/scheduler/tests/operation_test.go
##########
@@ -371,13 +371,162 @@ partitions:
 	})
 
 	assert.NilError(t, err, "UpdateRequest failed")
-
 	waitForAvailableNodeResource(t, ms.scheduler.GetClusterContext(), "[rm:123]default",
 		[]string{"node-1:1234"}, 300, 1000)
 	assert.Equal(t, int64(node1.GetCapacity().Resources[resources.MEMORY]), int64(300))
 	assert.Equal(t, int64(node1.GetCapacity().Resources[resources.VCORE]), int64(10))
 	assert.Equal(t, int64(schedulingNode1.GetAllocatedResource().Resources[resources.MEMORY]), int64(0))
 	assert.Equal(t, int64(schedulingNode1.GetAvailableResource().Resources[resources.MEMORY]), int64(300))
+	newRes, err := resources.NewResourceFromConf(map[string]string{"memory": "300", "vcore": "10"})
+	assert.NilError(t, err, "failed to create resource")
+	if !resources.Equals(newRes, partitionInfo.GetTotalPartitionResource()) {
+		t.Errorf("Expected partition resource %s, doesn't match with actual partition resource %s", newRes, partitionInfo.GetTotalPartitionResource())
+	}
+	if !resources.Equals(newRes, partitionInfo.GetQueue("root").GetMaxResource()) {
+		t.Errorf("Expected partition resource %s, doesn't match with actual partition resource %s", newRes, partitionInfo.GetQueue("root").GetMaxResource())
+	}
+
+	// update node capacity with more mem and same vcores
+	err = ms.proxy.Update(&si.UpdateRequest{
+		UpdatedNodes: []*si.UpdateNodeInfo{
+			{
+				NodeID:     "node-1:1234",
+				Attributes: map[string]string{},
+				SchedulableResource: &si.Resource{
+					Resources: map[string]*si.Quantity{
+						"memory": {Value: 100},
+						"vcore":  {Value: 20},
+					},
+				},
+				Action: si.UpdateNodeInfo_UPDATE,
+			},
+		},
+		RmID: "rm:123",
+	})
+	assert.NilError(t, err, "UpdateRequest failed")
+	waitForAvailableNodeResource(t, ms.scheduler.GetClusterContext(), "[rm:123]default",
+		[]string{"node-1:1234"}, 100, 1000)

Review comment:
       No need to split the line to keep length down, make it one line. (multiple times found)

##########
File path: pkg/scheduler/objects/node.go
##########
@@ -127,15 +127,18 @@ func (sn *Node) GetCapacity() *resources.Resource {
 	return sn.totalResource.Clone()
 }
 
-func (sn *Node) SetCapacity(newCapacity *resources.Resource) {
+func (sn *Node) SetCapacity(newCapacity *resources.Resource) *resources.Resource {
 	sn.Lock()
 	defer sn.Unlock()
+	var delta *resources.Resource = nil
 	if resources.Equals(sn.totalResource, newCapacity) {
 		log.Logger().Debug("skip updating capacity, not changed")
-		return
+		return delta
 	}
+	delta = resources.Sub(newCapacity, sn.totalResource)

Review comment:
       Simplify this to a `return nil` and create a mew var using `:=`, can remove the var delta when you do that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] wilfred-s commented on a change in pull request #229: YUNIKORN-466: Node capacity update does not update root queue or partition

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on a change in pull request #229:
URL: https://github.com/apache/incubator-yunikorn-core/pull/229#discussion_r540588990



##########
File path: pkg/scheduler/partition.go
##########
@@ -586,6 +586,16 @@ func (pc *PartitionContext) removeNode(nodeID string) []*objects.Allocation {
 	return pc.removeNodeInternal(nodeID)
 }
 
+// Update a node

Review comment:
       Explain what you update on the node and why that triggers the partition update etc.

##########
File path: pkg/scheduler/partition.go
##########
@@ -586,6 +586,16 @@ func (pc *PartitionContext) removeNode(nodeID string) []*objects.Allocation {
 	return pc.removeNodeInternal(nodeID)
 }
 
+// Update a node
+func (pc *PartitionContext) updateNode(node *objects.Node, newCapacity *resources.Resource) {
+	pc.Lock()
+	defer pc.Unlock()
+	pc.totalPartitionResource.AddTo(newCapacity)
+	pc.totalPartitionResource.SubFrom(node.GetCapacity())
+	pc.root.SetMaxResource(pc.totalPartitionResource)
+	node.SetCapacity(newCapacity)

Review comment:
       This is pretty lock heavy and might take locks that are not needed at all.
   Can we instead change the return of `node.SetCapacity()` to return a `*resources.Resource` and return the change delta? If that delta not zero or not nil we do not update the partition and root queue. Calling `node.SetCapacity()` outside the partition lock and keep the time we hold the partition lock to a minimum.

##########
File path: pkg/scheduler/tests/operation_test.go
##########
@@ -378,6 +383,67 @@ partitions:
 	assert.Equal(t, int64(node1.GetCapacity().Resources[resources.VCORE]), int64(10))
 	assert.Equal(t, int64(schedulingNode1.GetAllocatedResource().Resources[resources.MEMORY]), int64(0))
 	assert.Equal(t, int64(schedulingNode1.GetAvailableResource().Resources[resources.MEMORY]), int64(300))
+
+	newRes, err = resources.NewResourceFromConf(map[string]string{"memory": "300", "vcore": "10"})
+	assert.NilError(t, err, "failed to create resource")
+	assert.Equal(t, newRes.DAOString(), partitionInfo.GetTotalPartitionResource().DAOString())
+	assert.Equal(t, newRes.DAOString(), partitionInfo.GetQueue("root").GetMaxResource().DAOString())
+
+	// Register a node
+	err = ms.proxy.Update(&si.UpdateRequest{

Review comment:
       We have a node already, we can perform the second part of the test (lowering the resources) using the same node.

##########
File path: pkg/scheduler/tests/operation_test.go
##########
@@ -352,6 +352,11 @@ partitions:
 	assert.Equal(t, int64(schedulingNode1.GetAllocatedResource().Resources[resources.MEMORY]), int64(0))
 	assert.Equal(t, int64(schedulingNode1.GetAvailableResource().Resources[resources.MEMORY]), int64(100))
 
+	newRes, err := resources.NewResourceFromConf(map[string]string{"memory": "100", "vcore": "20"})
+	assert.NilError(t, err, "failed to create resource")
+	assert.Equal(t, newRes.DAOString(), partitionInfo.GetTotalPartitionResource().DAOString())
+	assert.Equal(t, newRes.DAOString(), partitionInfo.GetQueue("root").GetMaxResource().DAOString())

Review comment:
       Instead of using string comparisons use direct resource compare call either per value you want to check or by using resources.Equals() as is done throughout the test. The string comparison does not add anything extra.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] manirajv06 commented on a change in pull request #229: YUNIKORN-466: Node capacity update does not update root queue or partition

Posted by GitBox <gi...@apache.org>.
manirajv06 commented on a change in pull request #229:
URL: https://github.com/apache/incubator-yunikorn-core/pull/229#discussion_r540981194



##########
File path: pkg/scheduler/partition.go
##########
@@ -586,6 +586,16 @@ func (pc *PartitionContext) removeNode(nodeID string) []*objects.Allocation {
 	return pc.removeNodeInternal(nodeID)
 }
 
+// Update a node

Review comment:
       > Explain what you update on the node and why that triggers the partition update etc.
   
   On receiving node update event signal, context#updateNodes calls partition#updateNode() to set the correct value for both partition total resource and root queue max resource.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org