You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Isaac Z. Schlueter (JIRA)" <ji...@apache.org> on 2014/02/25 17:54:20 UTC

[jira] [Created] (COUCHDB-2102) Downstream replicator database bloat

Isaac Z. Schlueter created COUCHDB-2102:
-------------------------------------------

             Summary: Downstream replicator database bloat
                 Key: COUCHDB-2102
                 URL: https://issues.apache.org/jira/browse/COUCHDB-2102
             Project: CouchDB
          Issue Type: Bug
      Security Level: public (Regular issues)
          Components: Replication
            Reporter: Isaac Z. Schlueter


When I do continuous replication from one db to another, I get a lot of bloat over time.

For example, replicating a _users db with a relatively low level of writes, and around 30,000 documents, the size on disk of the downstream replica was over 300MB after 2 weeks.  I compacted the DB, and the size dropped to about 20MB (slightly smaller than the source database).

Of course, I realize that I can configure compaction to happen regularly.  But this still seems like a rather excessive tax.  It is especially shocking to users who are replicating a 100GB database full of attachments, and find it grow to 400GB if they're not careful!  You can easily end up in a situation where you don't have enough disk space to successfully compact.

Is there a fundamental reason why this happens?  Or has it simply never been a priority?  It'd be awesome if replication were more efficient with disk space.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)