You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/29 21:43:20 UTC

[jira] [Commented] (COUCHDB-3088) restart of couch_replication_server causes a stampede

    [ https://issues.apache.org/jira/browse/COUCHDB-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400044#comment-15400044 ] 

ASF GitHub Bot commented on COUCHDB-3088:
-----------------------------------------

GitHub user iilyak opened a pull request:

    https://github.com/apache/couchdb-couch-replicator/pull/44

    Inject random delays in scan_all_dbs

    couch_replication_server scans filesystem to find all _replication
    databases. For every database found it does
    
        gen_server:cast(Server, {resume_scan, DbName})
    
    Extract independent process where we do gen_server:cast after a random delay.
    This effectively removes stampede and randomizes the order in which we
    process _replication databases.
    
    COUCHDB-3088

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloudant/couchdb-couch-replicator 69914-insert-random-delays

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/couchdb-couch-replicator/pull/44.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #44
    
----
commit 5715c5e25dba61442834b08d7d7202b185341a87
Author: ILYA Khlopotov <ii...@ca.ibm.com>
Date:   2016-07-29T21:32:02Z

    Inject random delays in scan_all_dbs
    
    couch_replication_server scans filesystem to find all _replication
    databases. For every database found it does
    
        gen_server:cast(Server, {resume_scan, DbName})
    
    Extract independent process where we do gen_server:cast after a random delay.
    This effectively removes stampede and randomizes the order in which we
    process _replication databases.
    
    COUCHDB-3088

----


> restart of couch_replication_server causes a stampede
> -----------------------------------------------------
>
>                 Key: COUCHDB-3088
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3088
>             Project: CouchDB
>          Issue Type: Bug
>            Reporter: ILYA
>
> couch_replication_server scans all files in database_dir searching for files matching "_replicator.<number>.couch". For every _replication db it does gen_server:cast(Server, {resume_scan, DbName}). This creates a stampede effect and causes sharp load spikes on the replication cluster. The problem get worse if you migrate from older version of couchdb. In this case there is a logic which injects validation ddoc into every _replication db. Causing a spike in [couchdb, database_writes] metric. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)