You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2016/02/20 01:58:18 UTC

[jira] [Commented] (SOLR-8707) Distribute (auto)commit requests evenly over time in multi shard/replica collections

    [ https://issues.apache.org/jira/browse/SOLR-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155241#comment-15155241 ] 

Hoss Man commented on SOLR-8707:
--------------------------------

bq. For example, in case there are 6 cores and auto commit time is 60 second, the first core commit without delay, the second core do first commit after 10 seconds and commit in 60 seconds interval afterwards, and so on.

interesting ... a naive effort for individual cores to "space themselves out" in time could probably be done fairly trivially when initializing the auto commit timers on core load w/o a lot of continual coordination even if replicas are added/removed over time:

if ZK mode:
* determine what shard we are
* request a list of all (known) replicas for our shard (even if they aren't currently active)
* sort list of replicas by name, and locate our position N in the list and the list size S
* assign "delayUnit = autoCommitTime / S"
* set an initial delay on the auto commit timer thread to "(delayUnit * N) + rand(0, delayUnit)"

(The small amount of randomness seeming like a good idea to me in case some replica is replaced by a new replica with a diff name, causing a different existing replica (that doesn't pay know about the change to the list of ll replicas) to shift up/down one in the list and think it has the same N as the new replica)



> Distribute (auto)commit requests evenly over time in multi shard/replica collections
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-8707
>                 URL: https://issues.apache.org/jira/browse/SOLR-8707
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Michael Sun
>
> In current implementation, all Solr nodes start commit for all cores in a collection almost at the same time. As result, it creates a load spike in cluster at regular interval, particular when collection is on HDFS. The main reason is that all cores are created almost at the same time for a collection and do commit in a fixed interval afterwards.
> It's good to distribute the the commit load evenly to avoid load spike. It helps to improve performance and reliability in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org