You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by "Roessner, Silvester" <si...@vision.zeiss.com> on 2010/03/24 09:29:59 UTC

Why is the actual size of a database 4 times larger than the netto payload?

Hi all,

when I store 1000 copies of a pure JSON document (size 245,310 Bytes) in
a freshly created database,
the database itself is 0.9 GB big.

That is almost 4 times bigger than the actual netto payload.

Is this normal? 
When I'll move parts of the document in attachments,
will I also see this 4 times increase?

Thanks in advance,

Rosswart
This message is intended for a particular addressee only and
may contain business or company secrets. If you have received
this email in error, please contact the sender and delete the
message immediately. Any use of this email, including saving,
publishing, copying, replication or forwarding of the message
or the contents is not permitted.


AW: Why is the actual size of a database 4 times larger than the netto payload?

Posted by "Roessner, Silvester" <si...@vision.zeiss.com>.
On Wed, Mar 24, 2010 at 15:58, Paul Davis wrote:
> On Wed, Mar 24, 2010 at 5:40 AM, Roessner, Silvester
<si...@vision.zeiss.com> wrote:
>> On Wed, Mar 24 2010 at 10:06 Benoit Chesneau wrote:
>>
>> I tried it with consecutive ids as well but I the database is still 4

>> times as big as the payload.
>> I think I must live with this fact.
>>
>>
>> Silvester
>>
>
> The consecutive id's only make a difference when using _bulk_docs
because it removes the holes in the append > only btree.
> 
> Did you check to see what the file size was after a compaction? It
should be significantly better.


Hi Paul,

I just compacted the database containing 100 documents with consecutive
id's.
The size of the database kept the same.

Here you can see what my test looks like.
#document is just a record I use to store metadata.
cma_document:save/1 will save #document with the given _id in CouchDB.

Silvester


=== Code

test6(Start, End) when End >= Start ->
	Database = "jobs",
	Empty = cma_document:new(Database),
      {ok, UTF8} =
file:read_file("V:/Product/CalcEngine/interface/OneC/Version-1.0/test/Cz
vRx/Output-localhost/13.onec.js"),	
 	Filled = cma_document:set_content(Empty, UTF8),
	test6_loop(Filled, Start, End).


test6_loop(Document, Start, End) when Start =< End ->
	io:format("~p ", [Start]),
	if Start rem 10 =:= 0 ->
		   io:format("~n");
	   true -> 
		   ok
	end,
	{ok, _} = cma_document:save(Document#document{id =
integer_to_list(Start)}),
	test6_loop(Document, Start + 1, End);


test6_loop(_, _, _) ->
	io:format("~n").

=== Code

Re: Why is the actual size of a database 4 times larger than the netto payload?

Posted by Paul Davis <pa...@gmail.com>.
On Wed, Mar 24, 2010 at 5:40 AM, Roessner, Silvester
<si...@vision.zeiss.com> wrote:
> On Wed, Mar 24 2010 at 10:06 Benoit Chesneau wrote:
>> Compact not only remove revisions but also holes in the b-tree. Also
> make sure you use consecutive ids.
>
> I tried it with consecutive ids as well but I the database is still 4
> times as big as the payload.
> I think I must live with this fact.
>
>
> Silvester
>

The consecutive id's only make a difference when using _bulk_docs
because it removes the holes in the append only btree.

Did you check to see what the file size was after a compaction? It
should be significantly better.

Paul Davis

AW: Why is the actual size of a database 4 times larger than the netto payload?

Posted by "Roessner, Silvester" <si...@vision.zeiss.com>.
On Wed, Mar 24 2010 at 10:06 Benoit Chesneau wrote:
> Compact not only remove revisions but also holes in the b-tree. Also
make sure you use consecutive ids.

I tried it with consecutive ids as well but I the database is still 4
times as big as the payload.
I think I must live with this fact.


Silvester

Re: Why is the actual size of a database 4 times larger than the netto payload?

Posted by Benoit Chesneau <bc...@gmail.com>.
On Wed, Mar 24, 2010 at 10:02 AM, Roessner, Silvester
<si...@vision.zeiss.com> wrote:
> On, 24 Mar 2010, at 09:36, Jan Lehnardt [jan@apache.org] wrote
>
>> Yeah, you'll want to run compaction to redice on-disk file size.
>> See http://wiki.apache.org/couchdb/Compaction for details.
>
> I stored each of the copies as a separat document with a unique ID.
> My goal was to estimate how much disk space my database will consume.
> So there are no old revisions to purge.
>
> I did a compaction already since I hoped CouchDB would compact anyway.
> But since there are no old revisions the database keeps at 0.9 GB.

Compact not only remove revisions but also holes in the b-tree. Also
make sure you use consecutive ids.

- benoit

AW: Why is the actual size of a database 4 times larger than the netto payload?

Posted by "Roessner, Silvester" <si...@vision.zeiss.com>.
On, 24 Mar 2010, at 09:36, Jan Lehnardt [jan@apache.org] wrote

> Yeah, you'll want to run compaction to redice on-disk file size.
> See http://wiki.apache.org/couchdb/Compaction for details.

I stored each of the copies as a separat document with a unique ID.
My goal was to estimate how much disk space my database will consume.
So there are no old revisions to purge.

I did a compaction already since I hoped CouchDB would compact anyway.
But since there are no old revisions the database keeps at 0.9 GB.


> This message is intended for a particular addressee only and
> may contain business or company secrets. If you have received
> this email in error, please contact the sender and delete the
> message immediately. Any use of this email, including saving,
> publishing, copying, replication or forwarding of the message
> or the contents is not permitted.

Oh - this one's come frome my companies e-mail system I think

Cheers

Rosswart
This message is intended for a particular addressee only and
may contain business or company secrets. If you have received
this email in error, please contact the sender and delete the
message immediately. Any use of this email, including saving,
publishing, copying, replication or forwarding of the message
or the contents is not permitted.


Re: Why is the actual size of a database 4 times larger than the netto payload?

Posted by Jan Lehnardt <ja...@apache.org>.
On 24 Mar 2010, at 01:29, Roessner, Silvester wrote:

> Hi all,
> 
> when I store 1000 copies of a pure JSON document (size 245,310 Bytes) in
> a freshly created database,
> the database itself is 0.9 GB big.
> 
> That is almost 4 times bigger than the actual netto payload.
> 
> Is this normal? 

Yeah, you'll want to run compaction to redice on-disk file size.
See http://wiki.apache.org/couchdb/Compaction for details.

> When I'll move parts of the document in attachments,
> will I also see this 4 times increase?

It's not the document contents, but the sparse btree structure that 
causes the extra space to be used.


> This message is intended for a particular addressee only and
> may contain business or company secrets. If you have received
> this email in error, please contact the sender and delete the
> message immediately. Any use of this email, including saving,
> publishing, copying, replication or forwarding of the message
> or the contents is not permitted.

When you publish stuff on the web, it's out there.

Cheers
Jan
--


Re: Why is the actual size of a database 4 times larger than the netto payload?

Posted by Benoit Chesneau <bc...@gmail.com>.
On Wed, Mar 24, 2010 at 9:29 AM, Roessner, Silvester
<si...@vision.zeiss.com> wrote:
> Hi all,
>
> when I store 1000 copies of a pure JSON document (size 245,310 Bytes) in
> a freshly created database,
> the database itself is 0.9 GB big.
>
> That is almost 4 times bigger than the actual netto payload.
>
> Is this normal?
> When I'll move parts of the document in attachments,
> will I also see this 4 times increase?
>
> Thanks in advance,
>

_compact is your friend.

- benoit