You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/02 21:43:55 UTC

[GitHub] [druid] mounikanakkala opened a new issue #10940: Coordinator loads all the data only in few historicals of a tier, leaving the rest almost empty

mounikanakkala opened a new issue #10940:
URL: https://github.com/apache/druid/issues/10940


   ### Affected Version
   
   0.20.0
   
   ### Description
   
   Coordinator picks only few historicals of a tier and fills all the data in them. Rest of them are left almost empty.
   
   ### Series of events
    - We have a tier which had four historicals to begin with. All the four reached 88% disk storage.
    - We added two historicals of exactly same configuration. Rebalancing started to happen, observed that max (disk storage on any instance) = 67%. This took 40 minutes
    - In the next 40 minutes, it continued to rebalanced by draining data from existing historicals and adding to the newly added two historicals. This was the percentage of disk usage on the historicals 40%, 32%, 99.4%, 99.5%, 48%, 37%. It didn't stop at 99.4%, it continued to keep adding data to the same two instances which were almost 100%
    - We added two more instances. Rebalancing started to happen again
    - Now we have in total 8 instances. In Current configuration 4 instances have 84%, rest of the 4 have 5% data
    - We observed that Coordinator was draining segments from four of the instances and distributing them among the rest of the four instances.
   
   ### Balancing Strategy
   We were reading about coordinator balancing strategy and we have question on that
   - Why did coordinator add most of the data only on 4 historicals when we have 8 historicals in total.
   - Under Balancing segment load section in the [Coordinator documentation page](https://druid.apache.org/docs/latest/design/coordinator.html), we have the following sentence
   `For every Historical process tier in the cluster, the Coordinator process will determine the Historical process with the highest utilization and the Historical process with the lowest utilization. The percent difference in utilization between the two processes is computed, and if the result exceeds a certain threshold, a number of segments will be moved from the highest utilized process to the lowest utilized process.`
   Can you please explain what that threshold is and where we can find this value? Tried to go through the Druid code on GitHub but could not find it.
   Is this the reason why only 4 instances have most of the data?
   - We did not apply specific configuration for `druid.coordinator.balancer.strategy`. So we must be using the default which is 'cost' on version 0.20.0
   - We know that cachingCost and diskNormalized are two other options. If we want equal distribution of segments, we might have to use diskNormalized. Are there any problems with this configuration? Do we need to change other configurations so that diskNormalized works well?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liubo-it commented on issue #10940: Coordinator loads all the data only in few historicals of a tier, leaving the rest almost empty

Posted by GitBox <gi...@apache.org>.
liubo-it commented on issue #10940:
URL: https://github.com/apache/druid/issues/10940#issuecomment-845740413


   @mounikanakkala  
   
   I also encountered the same problem. How did you solve it ! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] mounikanakkala commented on issue #10940: Coordinator loads all the data only in few historicals of a tier, leaving the rest almost empty

Posted by GitBox <gi...@apache.org>.
mounikanakkala commented on issue #10940:
URL: https://github.com/apache/druid/issues/10940#issuecomment-790596843


   We tried cachingcost strategy. I would not say it is exactly balanced. But almost balanced. These seems to be a good strategy, especially when it is recommended.
   
   But if we can get an explanation of what could have happened and any caveats we need to know about cachingCost, that'll be helpful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org