You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by "pbacsko (via GitHub)" <gi...@apache.org> on 2023/02/16 16:58:49 UTC

[GitHub] [yunikorn-core] pbacsko commented on pull request #502: [YUNIKORN-1566] Config update can leave queue settings in an inconsistent state

pbacsko commented on PR #502:
URL: https://github.com/apache/yunikorn-core/pull/502#issuecomment-1433407755

   > Looking through the changes I am not too worried about the new config not being OK to apply. The config validation looks at the new config and makes sure it all parses correctly and that the structure is valid. If we have a gap in that area we should update the config validator. 
   
   I checked the config validator. Yes, resource and ACL strings can cause problems and those are validated.
   The method `Queue.setTemplate()` can return an error and it's not validated, however if this fails we swallow it and return a nil without printing anything. This is suspect.
   
   > 
   > * the in memory config shows a queue `root.parent.leaf`  The newly created config has only got `root.parent`. Do we remove the queue `leaf`? Parent does not change type to a leaf and it was an auto parent before. How do we handle this...
   By default, `leaf` is marked for removal and goes into `Draining` state (if it is a managed queue). Applications in that queue keep running. In case of a parent, the entire hierarchy below it is marked for removal.
   
   > * the in memory config shows `root.my-queue` defined as a leaf. The newly created config has `root.my-queue.spark`. That as far as I can tell causes an automatic type change to parent for my-queue. What happens with the applications that run there?
   I think that is what we partly discussed in the JIRA ticket. We can create a new queue like `root.preserved` and move existing applications into that queue. Plus we have to update queue references in the `Application` objects. I can do a POC in this PR and let's see if it is any good.
   
   > * the in memory config shows `root.my-queue.spark` Both `my-queue` and `spark` are dynamic queues. The newly created config has `root.my-queue` as a leaf. That breaks things badly I think.
   After some investigation, it looks like the dynamic queues are automatically cleaned up by the partition manager as soon as they become empty. Other than that, I don't see special treatment.
   I suggest the same approach, running applications should be moved to a pre-defined queue.
   
   Another thing that is important that asks and allocations cause the priorities to be updated and re-calculated. Basically, we also need to do two things:
   1. Make sure that no preemption/scheduling cycle runs during the config update
   2. Recalculate all priorities if there was a change
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org