You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by st...@beatport.com on 2011/04/07 23:24:15 UTC

Couch Replication: slave on-disk size bloat

Hello there.

New couch admin here.

I've got two couchdb machines that are not being written to.

I cleaned off the slave, then replicated and compacted the DBs on the 
slave and the slave DB size on-disk is *way* more than on the master.

Also, we've found that when the slaves are being actively written 
(replicated) to, over time, they grow > 10x the size of the master's file 
size.

Master:

-rw-r--r--  1 couchdb couchdb      20565 Apr  7 12:02 api-keys.couch
-rw-r--r--  1 couchdb couchdb 2414678118 Apr  7 11:41 similar-tracks.couch
-rw-r--r--  1 couchdb couchdb  851714150 Apr  7 11:52 sitemap.couch
drwxr-xr-x  2 couchdb couchdb       4096 Feb 15 01:10 .sitemap_design
-rw-r--r--  1 couchdb couchdb   76939362 Apr  7 11:27 top-downloads.couch
-rw-r--r--  1 couchdb couchdb  268513382 Apr  7 11:30 users-also-bought.couch

Slave (after delete, replication and compaction):

-rw-r--r--  1 couchdb couchdb       8280 Apr  7 12:02 api-keys.couch
-rw-r--r--  1 couchdb couchdb 5261164653 Apr  7 14:16 similar-tracks.couch
-rw-r--r--  1 couchdb couchdb  860319850 Apr  7 13:58 sitemap.couch
-rw-r--r--  1 couchdb couchdb  122777700 Apr  7 13:50 top-downloads.couch
-rw-r--r--  1 couchdb couchdb  285810794 Apr  7 13:52 users-also-bought.couch

Q1: Is this normal?
Q2: Is there a way to fix this or turn off whatever is turned on that 
causes this?

- Steve Webb

-- 
Steve Webb | System Administrator
Beatport | Play With Music
------------------------------------------
2399 Blake Street, Suite 170
Denver, Colorado USA 80205
tel: +1.720.932.9103
fax: +1.720.932.9104
noc: +1.303.565.2710
mobile: +1.303.564.4269

Re: Couch Replication: slave on-disk size bloat

Posted by Filipe David Manana <fd...@apache.org>.
On Fri, Apr 8, 2011 at 4:48 PM,  <st...@beatport.com> wrote:
>
> But I did do compaction and still had pretty huge files on the slave. Could
> it be something else that's causing this?

Ok, I missed that part. In that case, and at the moment, I have no
idea why that would happen :(

>
>>>>> Master:
>>>>>
>>>>> -rw-r--r--  1 couchdb couchdb      20565 Apr  7 12:02 api-keys.couch
>>>>> -rw-r--r--  1 couchdb couchdb 2414678118 Apr  7 11:41
>>>>> similar-tracks.couch
>>>>> -rw-r--r--  1 couchdb couchdb  851714150 Apr  7 11:52 sitemap.couch
>>>>> drwxr-xr-x  2 couchdb couchdb       4096 Feb 15 01:10 .sitemap_design
>>>>> -rw-r--r--  1 couchdb couchdb   76939362 Apr  7 11:27
>>>>> top-downloads.couch
>>>>> -rw-r--r--  1 couchdb couchdb  268513382 Apr  7 11:30
>>>>> users-also-bought.couch
>>>>>
>>>>> Slave (after delete, replication and compaction):
>>>>>
>>>>> -rw-r--r--  1 couchdb couchdb       8280 Apr  7 12:02 api-keys.couch
>>>>> -rw-r--r--  1 couchdb couchdb 5261164653 Apr  7 14:16
>>>>> similar-tracks.couch
>>>>> -rw-r--r--  1 couchdb couchdb  860319850 Apr  7 13:58 sitemap.couch
>>>>> -rw-r--r--  1 couchdb couchdb  122777700 Apr  7 13:50
>>>>> top-downloads.couch
>>>>> -rw-r--r--  1 couchdb couchdb  285810794 Apr  7 13:52
>>>>> users-also-bought.couch
>
> - Steve
>
> --
> Steve Webb | System Administrator
> Beatport | Play With Music
> ------------------------------------------
> 2399 Blake Street, Suite 170
> Denver, Colorado USA 80205
> tel: +1.720.932.9103
> fax: +1.720.932.9104
> noc: +1.303.565.2710
> mobile: +1.303.564.4269



-- 
Filipe David Manana,
fdmanana@gmail.com, fdmanana@apache.org

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Re: Couch Replication: slave on-disk size bloat

Posted by st...@beatport.com.
> Yes.
> The current replicator in 1.0.x and 1.1 (yet to be released), when
> replicating a new revision of a document with attachments, it
> replicates all the attachments, not just the attachments missing on
> the target. So you might end up with the same attachments several
> times in the target database file (compaction will fix this of
> course).
>
> The new replicator, in trunk only, doesn't have this behaviour - it
> replicates only the missing attachments.

But I did do compaction and still had pretty huge files on the slave. 
Could it be something else that's causing this?

>>>> Master:
>>>>
>>>> -rw-r--r--  1 couchdb couchdb      20565 Apr  7 12:02 api-keys.couch
>>>> -rw-r--r--  1 couchdb couchdb 2414678118 Apr  7 11:41
>>>> similar-tracks.couch
>>>> -rw-r--r--  1 couchdb couchdb  851714150 Apr  7 11:52 sitemap.couch
>>>> drwxr-xr-x  2 couchdb couchdb       4096 Feb 15 01:10 .sitemap_design
>>>> -rw-r--r--  1 couchdb couchdb   76939362 Apr  7 11:27 top-downloads.couch
>>>> -rw-r--r--  1 couchdb couchdb  268513382 Apr  7 11:30
>>>> users-also-bought.couch
>>>>
>>>> Slave (after delete, replication and compaction):
>>>>
>>>> -rw-r--r--  1 couchdb couchdb       8280 Apr  7 12:02 api-keys.couch
>>>> -rw-r--r--  1 couchdb couchdb 5261164653 Apr  7 14:16
>>>> similar-tracks.couch
>>>> -rw-r--r--  1 couchdb couchdb  860319850 Apr  7 13:58 sitemap.couch
>>>> -rw-r--r--  1 couchdb couchdb  122777700 Apr  7 13:50 top-downloads.couch
>>>> -rw-r--r--  1 couchdb couchdb  285810794 Apr  7 13:52
>>>> users-also-bought.couch

- Steve

-- 
Steve Webb | System Administrator
Beatport | Play With Music
------------------------------------------
2399 Blake Street, Suite 170
Denver, Colorado USA 80205
tel: +1.720.932.9103
fax: +1.720.932.9104
noc: +1.303.565.2710
mobile: +1.303.564.4269

Re: Couch Replication: slave on-disk size bloat

Posted by Filipe David Manana <fd...@apache.org>.
On Fri, Apr 8, 2011 at 4:31 PM,  <st...@beatport.com> wrote:
> I'm not sure what's in the databases.  Why?  Would large attachments explain
> this in some way?

Yes.
The current replicator in 1.0.x and 1.1 (yet to be released), when
replicating a new revision of a document with attachments, it
replicates all the attachments, not just the attachments missing on
the target. So you might end up with the same attachments several
times in the target database file (compaction will fix this of
course).

The new replicator, in trunk only, doesn't have this behaviour - it
replicates only the missing attachments.

>
> - Steve
>
> On Fri, 8 Apr 2011, Filipe David Manana wrote:
>
>> Do you have many and/or large attachments?
>>
>> On Thu, Apr 7, 2011 at 10:24 PM,  <st...@beatport.com> wrote:
>>>
>>> Hello there.
>>>
>>> New couch admin here.
>>>
>>> I've got two couchdb machines that are not being written to.
>>>
>>> I cleaned off the slave, then replicated and compacted the DBs on the
>>> slave
>>> and the slave DB size on-disk is *way* more than on the master.
>>>
>>> Also, we've found that when the slaves are being actively written
>>> (replicated) to, over time, they grow > 10x the size of the master's file
>>> size.
>>>
>>> Master:
>>>
>>> -rw-r--r--  1 couchdb couchdb      20565 Apr  7 12:02 api-keys.couch
>>> -rw-r--r--  1 couchdb couchdb 2414678118 Apr  7 11:41
>>> similar-tracks.couch
>>> -rw-r--r--  1 couchdb couchdb  851714150 Apr  7 11:52 sitemap.couch
>>> drwxr-xr-x  2 couchdb couchdb       4096 Feb 15 01:10 .sitemap_design
>>> -rw-r--r--  1 couchdb couchdb   76939362 Apr  7 11:27 top-downloads.couch
>>> -rw-r--r--  1 couchdb couchdb  268513382 Apr  7 11:30
>>> users-also-bought.couch
>>>
>>> Slave (after delete, replication and compaction):
>>>
>>> -rw-r--r--  1 couchdb couchdb       8280 Apr  7 12:02 api-keys.couch
>>> -rw-r--r--  1 couchdb couchdb 5261164653 Apr  7 14:16
>>> similar-tracks.couch
>>> -rw-r--r--  1 couchdb couchdb  860319850 Apr  7 13:58 sitemap.couch
>>> -rw-r--r--  1 couchdb couchdb  122777700 Apr  7 13:50 top-downloads.couch
>>> -rw-r--r--  1 couchdb couchdb  285810794 Apr  7 13:52
>>> users-also-bought.couch
>>>
>>> Q1: Is this normal?
>>> Q2: Is there a way to fix this or turn off whatever is turned on that
>>> causes
>>> this?
>>>
>>> - Steve Webb
>>>
>>> --
>>> Steve Webb | System Administrator
>>> Beatport | Play With Music
>>> ------------------------------------------
>>> 2399 Blake Street, Suite 170
>>> Denver, Colorado USA 80205
>>> tel: +1.720.932.9103
>>> fax: +1.720.932.9104
>>> noc: +1.303.565.2710
>>> mobile: +1.303.564.4269
>>>
>>
>>
>>
>>
>
> --
> Steve Webb | System Administrator
> Beatport | Play With Music
> ------------------------------------------
> 2399 Blake Street, Suite 170
> Denver, Colorado USA 80205
> tel: +1.720.932.9103
> fax: +1.720.932.9104
> noc: +1.303.565.2710
> mobile: +1.303.564.4269



-- 
Filipe David Manana,
fdmanana@gmail.com, fdmanana@apache.org

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Re: Couch Replication: slave on-disk size bloat

Posted by st...@beatport.com.
I'm not sure what's in the databases.  Why?  Would large attachments 
explain this in some way?

- Steve

On Fri, 8 Apr 2011, Filipe David Manana wrote:

> Do you have many and/or large attachments?
>
> On Thu, Apr 7, 2011 at 10:24 PM,  <st...@beatport.com> wrote:
>> Hello there.
>>
>> New couch admin here.
>>
>> I've got two couchdb machines that are not being written to.
>>
>> I cleaned off the slave, then replicated and compacted the DBs on the slave
>> and the slave DB size on-disk is *way* more than on the master.
>>
>> Also, we've found that when the slaves are being actively written
>> (replicated) to, over time, they grow > 10x the size of the master's file
>> size.
>>
>> Master:
>>
>> -rw-r--r--  1 couchdb couchdb      20565 Apr  7 12:02 api-keys.couch
>> -rw-r--r--  1 couchdb couchdb 2414678118 Apr  7 11:41 similar-tracks.couch
>> -rw-r--r--  1 couchdb couchdb  851714150 Apr  7 11:52 sitemap.couch
>> drwxr-xr-x  2 couchdb couchdb       4096 Feb 15 01:10 .sitemap_design
>> -rw-r--r--  1 couchdb couchdb   76939362 Apr  7 11:27 top-downloads.couch
>> -rw-r--r--  1 couchdb couchdb  268513382 Apr  7 11:30
>> users-also-bought.couch
>>
>> Slave (after delete, replication and compaction):
>>
>> -rw-r--r--  1 couchdb couchdb       8280 Apr  7 12:02 api-keys.couch
>> -rw-r--r--  1 couchdb couchdb 5261164653 Apr  7 14:16 similar-tracks.couch
>> -rw-r--r--  1 couchdb couchdb  860319850 Apr  7 13:58 sitemap.couch
>> -rw-r--r--  1 couchdb couchdb  122777700 Apr  7 13:50 top-downloads.couch
>> -rw-r--r--  1 couchdb couchdb  285810794 Apr  7 13:52
>> users-also-bought.couch
>>
>> Q1: Is this normal?
>> Q2: Is there a way to fix this or turn off whatever is turned on that causes
>> this?
>>
>> - Steve Webb
>>
>> --
>> Steve Webb | System Administrator
>> Beatport | Play With Music
>> ------------------------------------------
>> 2399 Blake Street, Suite 170
>> Denver, Colorado USA 80205
>> tel: +1.720.932.9103
>> fax: +1.720.932.9104
>> noc: +1.303.565.2710
>> mobile: +1.303.564.4269
>>
>
>
>
>

-- 
Steve Webb | System Administrator
Beatport | Play With Music
------------------------------------------
2399 Blake Street, Suite 170
Denver, Colorado USA 80205
tel: +1.720.932.9103
fax: +1.720.932.9104
noc: +1.303.565.2710
mobile: +1.303.564.4269

Re: Couch Replication: slave on-disk size bloat

Posted by Filipe David Manana <fd...@apache.org>.
Do you have many and/or large attachments?

On Thu, Apr 7, 2011 at 10:24 PM,  <st...@beatport.com> wrote:
> Hello there.
>
> New couch admin here.
>
> I've got two couchdb machines that are not being written to.
>
> I cleaned off the slave, then replicated and compacted the DBs on the slave
> and the slave DB size on-disk is *way* more than on the master.
>
> Also, we've found that when the slaves are being actively written
> (replicated) to, over time, they grow > 10x the size of the master's file
> size.
>
> Master:
>
> -rw-r--r--  1 couchdb couchdb      20565 Apr  7 12:02 api-keys.couch
> -rw-r--r--  1 couchdb couchdb 2414678118 Apr  7 11:41 similar-tracks.couch
> -rw-r--r--  1 couchdb couchdb  851714150 Apr  7 11:52 sitemap.couch
> drwxr-xr-x  2 couchdb couchdb       4096 Feb 15 01:10 .sitemap_design
> -rw-r--r--  1 couchdb couchdb   76939362 Apr  7 11:27 top-downloads.couch
> -rw-r--r--  1 couchdb couchdb  268513382 Apr  7 11:30
> users-also-bought.couch
>
> Slave (after delete, replication and compaction):
>
> -rw-r--r--  1 couchdb couchdb       8280 Apr  7 12:02 api-keys.couch
> -rw-r--r--  1 couchdb couchdb 5261164653 Apr  7 14:16 similar-tracks.couch
> -rw-r--r--  1 couchdb couchdb  860319850 Apr  7 13:58 sitemap.couch
> -rw-r--r--  1 couchdb couchdb  122777700 Apr  7 13:50 top-downloads.couch
> -rw-r--r--  1 couchdb couchdb  285810794 Apr  7 13:52
> users-also-bought.couch
>
> Q1: Is this normal?
> Q2: Is there a way to fix this or turn off whatever is turned on that causes
> this?
>
> - Steve Webb
>
> --
> Steve Webb | System Administrator
> Beatport | Play With Music
> ------------------------------------------
> 2399 Blake Street, Suite 170
> Denver, Colorado USA 80205
> tel: +1.720.932.9103
> fax: +1.720.932.9104
> noc: +1.303.565.2710
> mobile: +1.303.564.4269
>



-- 
Filipe David Manana,
fdmanana@gmail.com, fdmanana@apache.org

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."