You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/20 17:07:41 UTC

[GitHub] [druid] tanisdlj opened a new issue #11277: diskNormalized strategy ignored by coordinator

tanisdlj opened a new issue #11277:
URL: https://github.com/apache/druid/issues/11277


   ### Affected Version
   
   0.20.1
   
   ### Description
   
   - Cluster size: 20 Historicals, 32 Middlemanagers, 2 Coordinators, 2 Overlords, 5 Brokers, 5 Routers, 2,379,776 segments, ~70Tb of data. 
   
   The load of segments when having the strategy `diskNormalized` does not distribute the data between the servers. Check the picture below.
   
   ![image](https://user-images.githubusercontent.com/1453135/119019510-8ce8c080-b99d-11eb-87f9-e7abd8a762c5.png)
   
   I can see in the logs, lines like
   ```
   org.apache.druid.server.coordinator.CostBalancerStrategy - Cost Balancer Multithread strategy wasn't able to complete cost computation.: {class=org.apache.druid.server.coordinator.CostBalancerStrategy, exceptionType=class java.lang.InterruptedException, exceptionMessage=null}
   ```
   
   While my runtime config looks like this:
   ```
   druid.service=druid/coordinator
   druid.plaintextPort=8081
   druid.coordinator.startDelay=PT300S
   druid.coordinator.period=PT60S
   
   druid.coordinator.kill.on=true
   druid.coordinator.kill.maxSegments=100
   druid.coordinator.kill.durationToRetain=P7D
   
   druid.serverview.type=http
   druid.coordinator.loadqueuepeon.type=http
   # Number of segment load/drop requests to batch in one HTTP request.
   # Note that it must be smaller than druid.segmentCache.numLoadingThreads config on Historical process.
   druid.coordinator.loadqueuepeon.http.batchSize=56
   
   druid.coordinator.loadqueuepeon.curator.numCallbackThreads=200
   druid.coordinator.balancer.strategy=diskNormalized
   #druid.coordinator.balancer.strategy=cost
   
   # Reduces the max number of Segments loaded (speed) for a more even distribution between nodes
   # Default is 0 (max speed, unfair distribution)
   maxSegmentsInNodeLoadingQueue=1000
   
   druid.announcer.type=http
   ```
   
   It feels like the coordinator "obsess" on two-three servers and make them load all the segments. This happened with several coordinator restarts and switching from one server-coordinator to another one. Still, throwing most of our segments into 3 of our 12 servers (and actually, surpassing the Max disk size set in the runtime config).
   
   The load of segments took around 2-3 days, so it was not like it issue the order to load N segments and "boom", suddenly is full. In the picture you can appreciate that actually the server is already full but still loading stuff, while the empty ones are idle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] pjain1 commented on issue #11277: diskNormalized strategy ignored by coordinator

Posted by GitBox <gi...@apache.org>.

pjain1 commented on issue #11277:
URL: https://github.com/apache/druid/issues/11277#issuecomment-846128041


   I have observed this issue as well, try `cachingCost` IMO it works the best. `cachingCost` has a minor issue though that it does not load multiple segments having segment gran of `ALL` which actually should not happen in production but a thing to keep in mind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] tanisdlj commented on issue #11277: diskNormalized strategy ignored by coordinator

Posted by GitBox <gi...@apache.org>.

tanisdlj commented on issue #11277:
URL: https://github.com/apache/druid/issues/11277#issuecomment-851401679


   I think the problem we had with `cost` or `cachingCost` was the same, the coord filling servers with over 100% disk usage while leaving others half full


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org