You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/01/28 07:57:22 UTC

[GitHub] [incubator-yunikorn-k8shim] wilfred-s commented on pull request #224: [YUNIKORN-518] Placeholder manager failed to init during scheduler recovery

wilfred-s commented on pull request #224:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/224#issuecomment-768871332


   No, if we cannot work without a `PlaceholderManager` initialised we should not start up. We need tests for that.
   What you are doing now is that we crash the scheduler when, by chance, a task group is requested by an application. That might not come for minutes, hours or even days after starting up. So for no reason you could crash the scheduler after running for days without an issue.
   That is not acceptable, and a really bad user experience.
   
   I agree that the fix is to correctly and always initialise the `PlaceholderManager`. However since we have no `nil` checks in the code for `GetPlacholderManager()` we need to make sure that we take down the scheduler immediately if it is not initialised correctly. We do the same when the `AppManagementService` does not start correctly.
   Creation of the manager should be part of the `newShimScheduler()` and then in the run we need to check it (not sure if there is anything we need to do beside a simple not nil check) and start it. When it is not correct we call `ss.stop()` which  should also be nil safe to not cause a panic on shutdown.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org