You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "wiegandf (via GitHub)" <gi...@apache.org> on 2023/03/15 11:39:10 UTC

[GitHub] [druid] wiegandf opened a new issue, #13936: Persist idle status of supervisors

wiegandf opened a new issue, #13936:
URL: https://github.com/apache/druid/issues/13936

   ### Description
   
   We've tested the new feature implemented in https://github.com/apache/druid/pull/13144 
   
   Is it intended to resume supervisors after an overlord restart? It would be helpful to persist the idle state for supervisors so that they don't create new tasks when overlord restarts for example.
   
   ### Motivation
   
   We're running druid in k8s and overlord could restart anytime. With the idle feature, we could save a lot of middlemanagers from running. Though, when overlord is restarted all supervisors will resume and create tasks for each supervisor and our running middlemanagers will be overloaded.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] Persist idle status of supervisors (druid)

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on issue #13936:
URL: https://github.com/apache/druid/issues/13936#issuecomment-1778144238

   Yeah, I think that's right, which is good. It means that supervisors being able to go idle immediately is actually even more OK than I thought 🙂
   
   One implementation note btw: it looks like currently the latest offsets are not populated on supervisor startup. It happens in a separate reporting thread. So there'd have to be an explicit initial population of those offsets prior to starting up the reporting thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] Persist idle status of supervisors (druid)

Posted by "georgew5656 (via GitHub)" <gi...@apache.org>.
georgew5656 commented on issue #13936:
URL: https://github.com/apache/druid/issues/13936#issuecomment-1777843748

   this makes sense to me, but wouldn't the behavior you describe be the same as the current behavior during steady state of the supervisor?
   
   if a message came in while the supervisor was idle and the overlord was restarting, computeTotalLag() == 0 would not be true and the supervisor would not be marked as idle when it first runs.
   
   if it came in after the supervisor started up and was set as idle, that's pretty much the same time frame as the supervisor being idle, a message coming in, and then the supervisor starting a new task during steady state operation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] Persist idle status of supervisors (druid)

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on issue #13936:
URL: https://github.com/apache/druid/issues/13936#issuecomment-1777699171

   From reading the code, as far as I can tell, the state is not persisted, and it isn't set back up on supervisor restart until `inactiveAfterMillis` has elapsed.
   
   I can think of a couple options for addressing this. One is that we could treat `idle` like `suspended`, by putting it in the supervisor spec and persisting it. However this seems weird to me, since it would involve the supervisor needing to edit its own spec once it enters idle state, and typically supervisors don't edit their own specs in the metadata store. So this wouldn't be my preferred choice.
   
   Another, IMO better approach is to adjust the logic for supervisor startup to get into the idle state immediately if that makes sense. Logic could be something like: if committed offsets match the current stream offsets, and no tasks are running, go into idle immediately. I think we check the current stream offsets once a minute, so that means the main risk is that we'll launch tasks up to a minute late in the case where the offsets did match, but then a new message comes in right after the supervisor starts up. But that should be rare, and if it does happen, it means messages are coming in quite infrequently. So it's probably fine to err on the side of staying idle for up to a minute.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org