You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Matthew Sinclair-Day <ms...@gmail.com> on 2010/03/15 20:09:07 UTC

Replicated database size

Hi folks,

I've been putting couch 10.1 on Solaris 10/x86 through its paces 
lately trying to understand its replication performance and 
behavior, and have noticed the size of pre-compacted replicas 
can vary from one host to another.

In one test, the origin has roughly 1.2 million documents taking 
up 263MB of storage, but replicated size varies from one server 
to another:

origin   : 263MB
replica 1: 0.6GB
replica 2: 0.7GB
replica 3: 1.0GB

As expected the replicas are larger than the compacted origin 
database, but I didn't expect such size differences from replica 
to replica.

After compacting the origin (again) and the replicas, their 
sizes settle down to:

origin:  : 262.4MB
replica 1: 262.4MB
replica 2: 262.5MB
replica 3: 262.4MB

I'm trying to understand what the reason could be for the 
variance in pre-compacted database sizes.  All replicas are 
running the same build of CouchDB on the same version of 
Solaris, though replica3 is running on newer hardware in a 
VMWare container.

Matt


Re: Replicated database size

Posted by Adam Kocoloski <ko...@apache.org>.
On Mar 15, 2010, at 3:09 PM, Matthew Sinclair-Day wrote:

> Hi folks,
> 
> I've been putting couch 10.1 on Solaris 10/x86 through its paces lately trying to understand its replication performance and behavior, and have noticed the size of pre-compacted replicas can vary from one host to another.
> 
> In one test, the origin has roughly 1.2 million documents taking up 263MB of storage, but replicated size varies from one server to another:
> 
> origin   : 263MB
> replica 1: 0.6GB
> replica 2: 0.7GB
> replica 3: 1.0GB
> 
> As expected the replicas are larger than the compacted origin database, but I didn't expect such size differences from replica to replica.
> 
> After compacting the origin (again) and the replicas, their sizes settle down to:
> 
> origin:  : 262.4MB
> replica 1: 262.4MB
> replica 2: 262.5MB
> replica 3: 262.4MB
> 
> I'm trying to understand what the reason could be for the variance in pre-compacted database sizes.  All replicas are running the same build of CouchDB on the same version of Solaris, though replica3 is running on newer hardware in a VMWare container.
> 
> Matt

Hi Matt, the variation in target DB file sizes is due to variations in number and size of _bulk_docs calls used by the replicator.  The DB size is inversely correlated with the size of an average _bulk_docs POST, and the size of a POST is governed by the relative speed of the source and the target.  If the target is fast and the replication is limited by the source throughput you'll see lots of very small calls to _bulk_docs.  Conversely if the target is slow the replicator will batch writes together in blocks of 1000 and send them over.

In short, the faster your target server is the larger the un-compacted target DB will be. Looks like that VMWare container isn't slowing you down much at all :)  Best,

Adam