You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/12/15 16:20:50 UTC

[GitHub] [incubator-yunikorn-scheduler-interface] manirajv06 opened a new pull request #58: YUNIKORN-462: Streamline core to shim update on allocation change

manirajv06 opened a new pull request #58:
URL: https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/58


   ### What is this PR for?
   SI changes to remove ReSyncSchedulerCache plugin and added new fields in AllocationRelease message
   
   ### What type of PR is it?
   * [ ] - Improvement
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-462
   
   ### How should this be tested?
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-scheduler-interface] wilfred-s commented on pull request #58: YUNIKORN-462: Streamline core to shim update on allocation change

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on pull request #58:
URL: https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/58#issuecomment-1033356126


   Sorry that it has taken so long to get back to you :-(
   Way to long an update but it sets the direction for the other two repos involved
   
   We have one change in mind remove the `ReSyncSchedulerCache` call. The call is made twice from the core to the shim:
   * for new allocations
   * for removed allocations
   
   First new allocations: the message we send is a RMNewAllocationsEvent with all allocations that are new. That event is currently async. That is why we first we call a sync cache update with partial info. The change is that the event will become a sync call. We have enough information in the current Allocations array. This is part of the event we send and we can also call the assume of the pod inside the cache on the shim side by pulling that key from the allocation. We always call assume pod for every new allocation.
   
   Simple change on the SI side: we can remove the `AssumedAllocation` message. Leverage existing information for the AllocationKey
   
   On the remove side we have 4 locations where we send a RMReleaseAllocationEvent. Only in one location, as you pointed out, we also call the sync of the cache. The sync of the cache triggers the forget of an assumed pod. Only looking at the path that sends events back. The cache sync is part of one of these calls.
   1. `handleRMUpdateApplicationEvent` handles removal of an application. Does not call the cache sync.
   2. `updateNode` handles the node removal. Does not call the cache sync.
   3. `schedule` triggers the release of a placeholder. Does not call the cache sync.
   4. `processAllocationReleases` is processing release requests send by the shim. This calls the cache sync.
   The termination type for call 1,2 and 4 is STOPPED_BY_RM. For call 2 it is PLACEHOLDER_REPLACED.
   Every single allocation is assumed as per above description. So we should also forget a pod, remove it, from the assumed pod list when we remove the pod without exception. If we do not we could leak the entry in the assumed pod cache structure. There should be no difference in the communication for any of these cases between core and shim.
   
   Simple change on the SI side: we can remove the `ForgotAllocation` message. Add the `allocationKey` to the `AllocationRelease` message (check case for the new field!)
   
   The core sends the events synchronously. The shim collapses the assume call into the event processing for new allocations, returns as soon as possible forking of the long running tasks. The shim collapses the forget call into the event processing for the remove. If there is any special cases to not forget the assumption of a removed pod then the shim must implement it. The core should not be the one that decides this.
   
   After this we also need to completely remove the `ReSyncSchedulerCacheArgs` and the `ReSyncSchedulerCache` call from the interface.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-scheduler-interface] manirajv06 commented on pull request #58: YUNIKORN-462: Streamline core to shim update on allocation change

Posted by GitBox <gi...@apache.org>.
manirajv06 commented on pull request #58:
URL: https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/58#issuecomment-1036471768


   Thanks for your detailed explanation and giving me a clear picture.
   
   Have taken care of SI and Core accordingly. As discussed, once YUNIKORN-876 goes in, we can work on the shim. Summary is, We should call clear the cache on shim side for all these 4 calls. Since we need to clear cache in all 4 calls, then probably we should make all these calls sync?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-scheduler-interface] manirajv06 commented on pull request #58: YUNIKORN-462: Streamline core to shim update on allocation change

Posted by GitBox <gi...@apache.org>.
manirajv06 commented on pull request #58:
URL: https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/58#issuecomment-994967159


   In addition to changes discussed offline, have also made changes for synchronous release allocations in a separate flow as I thought of not disturbing the other use cases (trigger points) which are ok with async communication. This kind of synchronous release allocations is required only for processAllocationReleases() method because call to ForgotPod() happens only in this flow through ReSyncSchedulerCache plugin.
   
   Once overall approach makes sense, will need to work on unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-scheduler-interface] wilfred-s commented on a change in pull request #58: YUNIKORN-462: Streamline core to shim update on allocation change

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on a change in pull request #58:
URL: https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/58#discussion_r778492772



##########
File path: scheduler-interface-spec.md
##########
@@ -566,6 +561,10 @@ message AllocationRelease {
   TerminationType terminationType = 4;
   // human-readable message
   string message = 5;
+  // AllocationKey from AllocationAsk
+  string allocationkey = 6;

Review comment:
       This should already be covered in the message as the UUID. Otherwise we currently would not the possibility to release a pod that is not allocated yet (which is what an ask is).

##########
File path: scheduler-interface-spec.md
##########
@@ -566,6 +561,10 @@ message AllocationRelease {
   TerminationType terminationType = 4;
   // human-readable message
   string message = 5;
+  // AllocationKey from AllocationAsk
+  string allocationkey = 6;
+  // update cache, default is false
+  bool updateCache = 7;

Review comment:
       Can you explain why we would ever not want to update the cache. Would that not cause issues with an out of sync cache?
   The cache should be idempotent and update when needed:
   - delete can never fail: if the entry does not exist the result is the same as a successful delete
   - for creates and updates we should do a create OR update call with the end result being a consistent cache representing the correct object.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-scheduler-interface] wilfred-s closed pull request #58: YUNIKORN-462: Streamline core to shim update on allocation change

Posted by GitBox <gi...@apache.org>.
wilfred-s closed pull request #58:
URL: https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/58


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-scheduler-interface] manirajv06 commented on a change in pull request #58: YUNIKORN-462: Streamline core to shim update on allocation change

Posted by GitBox <gi...@apache.org>.
manirajv06 commented on a change in pull request #58:
URL: https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/58#discussion_r778625575



##########
File path: scheduler-interface-spec.md
##########
@@ -566,6 +561,10 @@ message AllocationRelease {
   TerminationType terminationType = 4;
   // human-readable message
   string message = 5;
+  // AllocationKey from AllocationAsk
+  string allocationkey = 6;
+  // update cache, default is false
+  bool updateCache = 7;

Review comment:
       context#processAllocationReleases is the only place where ReSyncSchedulerCache plugin is used to call ForgetPod in shim side for cache updates in addition to sending RMReleaseAllocationEvent event. RMReleaseAllocationEvent uses AllocationRelease message. AllocationRelease message is being used in few more places (for ex, context#handleRMUpdateApplicationEvent, node decommission etc) in addition to  context#processAllocationReleases in core side. Hence, to help shim whether to do cache operation or not while processing UpdateAllocation callback method, used "updateCache" field




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-scheduler-interface] manirajv06 commented on a change in pull request #58: YUNIKORN-462: Streamline core to shim update on allocation change

Posted by GitBox <gi...@apache.org>.
manirajv06 commented on a change in pull request #58:
URL: https://github.com/apache/incubator-yunikorn-scheduler-interface/pull/58#discussion_r778617714



##########
File path: scheduler-interface-spec.md
##########
@@ -566,6 +561,10 @@ message AllocationRelease {
   TerminationType terminationType = 4;
   // human-readable message
   string message = 5;
+  // AllocationKey from AllocationAsk
+  string allocationkey = 6;

Review comment:
       AssumePod/ForgetPod expects allocationKey to do its cache add/remove operation. Currently, ReSyncSchedulerCache plugin wraps AllocationKey and few more fields under AssumedAllocation/ForgotAllocation message from core side and sends to shim which eventually passes to AssumePod/ForgetPod methods. Now, with the removal of ReSyncSchedulerCache plugin, in this PR, added AllocationKey in AllocationRelease message as AllocationRelease is being passed to shim (as AllocationResponse) through event processing. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org