You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/12/21 05:18:51 UTC

[GitHub] [druid] jihoonson opened a new issue #9087: A simulator for segment balancing by the coordinator

jihoonson opened a new issue #9087: A simulator for segment balancing by the coordinator
URL: https://github.com/apache/druid/issues/9087
 
 
   ### Motivation
   
   One of what the coordinator is responsible for is segment balancing across historicals. We support several balancer strategies including the cost balancer strategy implemented in https://github.com/apache/druid/pull/2972. The cost balancer strategy (and its variants) would be the most popular strategy for now. This strategy is pretty good in most cases in production, but sometimes it could lead to an imbalanced segment distribution. However, since the segment balancing is done over a long period, it's not easy to debug why the balancer sometimes makes a suboptimal decision.
   
   ### Proposed changes
   
   This proposal is to add a new tool which emulates the segment balancing of the coordinator and reports metrics. It would accept the below input configurations.
   
   - Cluster configuration
   	- Per tier
   		- node spec (capacity)
   		- number of nodes
   		- number of decomissioned node
   - Data source configuration
   	- number of data sources
   	- number of replicas per data source
   	- number of new segments per _n_ runs per data source
   	- probability that a historical loads an assigned segment until the next run
   - Segment balancing configuration
   	- balancer strategy
   	- maxSegmentsToMove
   	- replicationThrottleLimit
   	- balancerComputeThreads
   	- maxSegmentsInNodeLoadingQueue
   	- decommissioningMaxPercentOfMaxSegmentsToMove
   - Number of runs to perform segment balancing
   
   The result would be metrics as below across all datasources and per datasource.
   
   - Mean, avg, min, max, standard deviation, coefficient of variation
   
   ### Rationale
   
   The problems in segment balancing usually happen when a production cluster has been running for a while. It's not easy to replicate the problem locally or in a separate test cluster. Soak test is also not easy because sometimes it requires to run the cluster for a fairly long time to replicate the problem.
   
   ### Operational impact
   
   There is no operational impact.
   
   ### Future work
   
   - The simulator would consider randomly appearing and disappearing historicals. 
   - It would also generate a report of the distribution change over time.
   - Debugging mode
       - Mode running the simulator until some defined conditions are met (such as max allowed standard deviation across data sources)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] samarthjain commented on issue #9087: A simulator for segment balancing by the coordinator

Posted by GitBox <gi...@apache.org>.
samarthjain commented on issue #9087: A simulator for segment balancing by the coordinator
URL: https://github.com/apache/druid/issues/9087#issuecomment-568344628
 
 
   @jihoonson - DiskNormalizedCostBalancerStrategy has been working out quite well for us in general. Where as CostBalancerStrategy generally leads to non-uniform distribution at least when a new cluster is spun up. Possibly the default load strategy should be switched to DiskNormalizedCostBalancerStrategy? Unless of course, other folks have ran into other issues with it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9087: A simulator for segment balancing by the coordinator

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9087: A simulator for segment balancing by the coordinator
URL: https://github.com/apache/druid/issues/9087#issuecomment-568366049
 
 
   That’s interesting. I think I’ve seen a similar skewed distribution with CostBalancerStrategy which had gone when we switched to DiskNormalizedCostBalancerStrategy. I think it’s a good idea to make it as a default but do you know what is causing the skewed distribution?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org