You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/11/08 03:16:31 UTC

[GitHub] [incubator-yunikorn-core] wilfred-s edited a comment on pull request #332: YUNIKORN-905: Core side changes of YUNIKORN-337

wilfred-s edited a comment on pull request #332:
URL: https://github.com/apache/incubator-yunikorn-core/pull/332#issuecomment-962774771


   Based on the unit test failures we must make sure that the order in the shim is correct. First recover apps then the nodes.
   
   Looking at the change we might have had an issue in the RMProxy for a long time. I do think that we need to add a retry in the node update when we recover the node. Even in the previous implementation there was no guarantee that all the application were added before a node was recovered. The tests in the unit tests used the order processing dependency to make sure it worked. 
   
   There was _never_ an order requirement on the events send by a shim or a use of complex updates events to support this ordering by the shim. An event to recover a node could be a separate UpdateRequest from the applications that should be recovered. That means we relied on the go routine ordering to hopefully do things correctly: i.e. events send by the shim to create new apps would be processed before node recovery started.
   
   That is a dangerous assumption: filing a follow up jira.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org