You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/05 12:22:40 UTC

[GitHub] [druid] licl2014 opened a new issue #10242: Coordinator will spend a lot of time(many hours) to assign segments when a historical lost.

licl2014 opened a new issue #10242:
URL: https://github.com/apache/druid/issues/10242


   Coordinator will spend a lot of time(many hours) to assign segments when  a historical(has  a lot of segments)  lost and realtime task can't complete handoff.
   `replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`  has no effect.
   
   ### Affected Version
   
   0.12.3
   
   ### Description
   
   - The lost historical has 20T data
   - Scheduling strategy is `cost`
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] licl2014 commented on issue #10242: Coordinator will spend a lot of time(many hours) to assign segments when a historical lost.

Posted by GitBox <gi...@apache.org>.
licl2014 commented on issue #10242:
URL: https://github.com/apache/druid/issues/10242#issuecomment-669801062


   We should control coordinator's scheduling time in one round when we use `cost` strategy ,otherwise the realtime tasks hand off will be affected. @asdf2014 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] licl2014 commented on issue #10242: Coordinator will spend a lot of time(many hours) to assign segments when a historical lost.

Posted by GitBox <gi...@apache.org>.
licl2014 commented on issue #10242:
URL: https://github.com/apache/druid/issues/10242#issuecomment-669228580


   The historical has about 1 million segments, after it lost(whatever offline permanently or just temporarily), coordinator will  assign replicate segments  to other active historicals in the next round scheduling,  but `cost algorithm` is time-consuming ,`replicationThrottleLimit`  and `maxSegmentsInNodeLoadingQueue`  has no effect(historicals load segment faster than coordinator assign segment),so coordinator will hang for hours.
   `Cachingcost` has no such problem,because cachingcost algorithm is very efficient and coordinator assign segment faster than historicals load segment, so `maxSegmentsInNodeLoadingQueue` and `replicationThrottleLimit` has effect. @asdf2014 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] asdf2014 commented on issue #10242: Coordinator will spend a lot of time(many hours) to assign segments when a historical lost.

Posted by GitBox <gi...@apache.org>.
asdf2014 commented on issue #10242:
URL: https://github.com/apache/druid/issues/10242#issuecomment-669653541


   @licl2014 It sounds like the bottleneck is in the calculation of the allocation strategy, but if you switch to the `cachingCost` allocation strategy, you need to be aware that memory consumption may increase.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] asdf2014 commented on issue #10242: Coordinator will spend a lot of time(many hours) to assign segments when a historical lost.

Posted by GitBox <gi...@apache.org>.
asdf2014 commented on issue #10242:
URL: https://github.com/apache/druid/issues/10242#issuecomment-669169703


   Hi, @licl2014 . Can you provide more information? For example: What is the number of segments on the failed Historical node? Does the failed Historical node need to be offline permanently or just temporarily? Is the bottleneck in server resources (CPU, disk or network bandwidth)? Or the server resources are very idle, but the Coordinator is time-consuming to calculate the allocation strategy?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org