You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Paul Hirst (JIRA)" <ji...@apache.org> on 2011/07/21 12:40:57 UTC
[jira] [Updated] (COUCHDB-1230) Replication slows down over time

     [ https://issues.apache.org/jira/browse/COUCHDB-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Hirst updated COUCHDB-1230:
--------------------------------

    Attachment: sequence_number.png

This is the last sequence number of a replication target graphed over ~20 hours.

It shows that restarting replication gives a speed boost and that after a while the speed diminishes.

> Replication slows down over time
> --------------------------------
>
>                 Key: COUCHDB-1230
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1230
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.0.2, 1.1
>         Environment: Ubuntu 10.04, 
>            Reporter: Paul Hirst
>         Attachments: sequence_number.png
>
>
> I have two databases which were replicated in the past, one is running 1.0.2. I shall call this the source database. The other is running 1.1.0, I shall call this the target database.
> The source and target are bidirectionally replicated using a push and pull replication from the target (using a couple of documents in the new _replicator database).
> The source database is in production and is getting changes applied to it from live systems. The target is only participating in replication and it's being used directly by any production systems.
> The database has about 50 million documents many of these will have been updated a handful of times. The database is about 500G after compaction, but the source database is currently at about 900G as it hasn't been compacted for a while.
> The databases were replicated in the past however this replication was torn down when the target was upgraded from 1.0.2 to 1.1.0. When replication was reenabled the system wasn't able to pick up were it left off and had to reenumerate all the documents again. This process initially started quickly but after a while ground to a halt such that the target actually stopped making progress against the source database.
> I found that restarting replication starts the process running again at a decent speed for a while. I did this by deleting and recreating the appropriate document in the _replicator database on the target.  
> I have graphed the last_seq of the target database against time for about a day, noting when replication was manually restarted. I shall try to attach the graph if possible. It shows a clear improvement in replication speed after restarting replication.
> I previously witnessed this behaviour between 1.0.2 databases but didn't grab any stats at the time but I don't think it's a new problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira