You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Sharath <sh...@gmail.com> on 2015/01/25 05:11:25 UTC

couchdb database size

Hi All,

recently moved to couchdb and find my databases taking a lot of diskspace

I have two database both with json documents (no attachments) - however the
sizes vary by a lot

database1      size 8.0GB    number of documents: 13337224
database2      size 29.4 GB    number of documents: 12981148

both the databases have been compacted

each document in database1 is 487 bytes long (including _id and _rev)
each document in database2 is 564 bytes long (including _id and _rev)

database1 should be ~6.1GB (only data without compression) [487 * 13337224
/ 1024 /1024]
database2 should be ~6.9GB (only data without compression) [564 * 12981148
/ 1024 /1024]

I'm curious why the database file takes 29 GB.

unfortunately I cannot post the document as this is prod data.

CouchDb is running on my mac 10.10.1 with default configuration.

database1 was populated by a bulk upload from a mysql extract and database
2 was populated by individual document inserts (put) database compaction
was let to complete (took ~30hr on database 2)

is there a command that compacts superfluous data? or am i missing anything?


thanks!

-Sharath

RE: Why am i seeing this behavior?

Posted by Stanley Iriele <si...@gmail.com>.
Yes...you can..but you might want to change your view first. Index on
time...that is to say emit the creation date in milliseconds into the
view...then use startkey=now() /endkey =now() - 7 days or something.
On Jan 29, 2015 8:03 AM, "TAE JIN KIM" <sn...@hotmail.com> wrote:

> Are you sure I can use list function for this purpose?
> I tried, but it is same behavior.
>
> > Date: Thu, 29 Jan 2015 04:45:49 +0400
> > Subject: Re: Why am i seeing this behavior?
> > From: kxepal@gmail.com
> > To: user@couchdb.apache.org
> >
> > View function get executed once on view resource request for only
> > updated documents.
> > Once document get indexed, the result is stored in view index file and
> > read from there since that moment.
> > If you need request-depended view results use list functions instead.
> > --
> > ,,,^..^,,,
> >
> >
> > On Thu, Jan 29, 2015 at 3:33 AM, TAE JIN KIM <sn...@hotmail.com>
> wrote:
> > > Hello,
> > >
> > > Here is my view.
> > >
> > > function(doc) {
> > >       var cloned = eval(uneval(doc));
> > >       cloned.status = myfunction(cloned.createddate);
> > >   …
> > >  ..
> > > }
> > >
> > > function myfunction(cd) {
> > >     var currentDate = new Date();
> > >     ……..currentDaate - cd… // calculate date difference…
> > >
> > > }
> > >
> > > bascially, myfunction returns some status based on date difference...
> > > I am consuming this one via rest API, ..and this works fine..but..the
> thing is that...it looks CouchDB doesn't always run this whenever this is
> invoked by API?..somehow return cached version??
> > >
> > > I meant...let's say here is some expected table...
> > >
> > > Here are expected Values
> > >
> > > 3 Days ago - 'Status A'
> > > 2 Days ago - 'Status B'
> > > yesterday  - 'Statuc C'
> > > today - 'Status D'
> > >
> > > So I am expecting to see the status value, which is 'Status D' today,
> but it still shows 'Status B' for some reason via Rest API.  I have to open
> couch db web admin interface to run this view manually in order to see
> correct status value, which is 'Status D'.
> > >
> > > Why is that?...
> > > Any idea?
> > >
> > > Thanks,
> > >
> > >
>

RE: Why am i seeing this behavior?

Posted by TAE JIN KIM <sn...@hotmail.com>.
Are you sure I can use list function for this purpose?
I tried, but it is same behavior.

> Date: Thu, 29 Jan 2015 04:45:49 +0400
> Subject: Re: Why am i seeing this behavior?
> From: kxepal@gmail.com
> To: user@couchdb.apache.org
> 
> View function get executed once on view resource request for only
> updated documents.
> Once document get indexed, the result is stored in view index file and
> read from there since that moment.
> If you need request-depended view results use list functions instead.
> --
> ,,,^..^,,,
> 
> 
> On Thu, Jan 29, 2015 at 3:33 AM, TAE JIN KIM <sn...@hotmail.com> wrote:
> > Hello,
> >
> > Here is my view.
> >
> > function(doc) {
> >       var cloned = eval(uneval(doc));
> >       cloned.status = myfunction(cloned.createddate);
> >   …
> >  ..
> > }
> >
> > function myfunction(cd) {
> >     var currentDate = new Date();
> >     ……..currentDaate - cd… // calculate date difference…
> >
> > }
> >
> > bascially, myfunction returns some status based on date difference...
> > I am consuming this one via rest API, ..and this works fine..but..the thing is that...it looks CouchDB doesn't always run this whenever this is invoked by API?..somehow return cached version??
> >
> > I meant...let's say here is some expected table...
> >
> > Here are expected Values
> >
> > 3 Days ago - 'Status A'
> > 2 Days ago - 'Status B'
> > yesterday  - 'Statuc C'
> > today - 'Status D'
> >
> > So I am expecting to see the status value, which is 'Status D' today, but it still shows 'Status B' for some reason via Rest API.  I have to open couch db web admin interface to run this view manually in order to see correct status value, which is 'Status D'.
> >
> > Why is that?...
> > Any idea?
> >
> > Thanks,
> >
> >
 		 	   		  

Re: Update handlers

Posted by Johannes Jörg Schmidt <sc...@netzmerk.com>.
Note that req.body, in contrast to req.query, is the raw request body and
have to be parsed manually: JSON.parse(req.body)

2015-01-29 20:52 GMT+01:00 Kiril Stankov <ki...@open-net.biz>:

> Yes, it really does...
> I am using nano.
> Must be problem there.
> Thanks!
>
> К.
>
>
>
>
> On 1/29/2015 8:08 PM, Alexander Shorin wrote:
>
>> Works for me for following request:
>>
>> curl -XPOST 'http://localhost:5984/a/_design/test/_update/test/test?
>> field=foo&value=bar'
>> --
>> ,,,^..^,,,
>>
>>
>> On Thu, Jan 29, 2015 at 8:51 PM, Kiril Stankov <ki...@open-net.biz>
>> wrote:
>>
>>> Hi again,
>>>
>>> I'm still trying to use the Update Handlers, but I'm running into
>>> problem:
>>>
>>> |||function(doc,req) {var field=req.query.field; var value =
>>> req.query.value; doc[field]=value; return [doc, 'ok'];}
>>>
>>> |||
>>>
>>> The function executes, returns 'ok', the revision of the doc is
>>> increased by
>>> 2, but the fields are not changed.
>>> The field I want to update is present in the doc.
>>> Any idea what am I doing wrong?
>>>
>>> Thanks in advance.
>>> ------------------------------------------------------------------------
>>> *With best regards,*
>>> Kiril Stankov,
>>>
>>> On 1/29/2015 2:52 PM, Kiril Stankov wrote:
>>>
>>>> Thanks all!
>>>> ------------------------------------------------------------
>>>> ------------
>>>> *With best regards,*
>>>> Kiril Stankov,
>>>> CEO
>>>>
>>>>
>>>>             This Email disclaimer
>>>>             <http://open-net.biz/emailsignature.html> is integral part
>>>>             of this message.
>>>>
>>>> On 1/29/2015 11:57 AM, Alexander Shorin wrote:
>>>>
>>>>> On Thu, Jan 29, 2015 at 12:45 PM, Ingo Radatz <thewhorn@googlemail.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>>> With the help of a rewrite rule you could force an put to be handled
>>>>>> by
>>>>>> an update handler before the doc gets stored
>>>>>>
>>>>> Need to keep in mind that this rewrite rule could be easily bypassed
>>>>> by bulk update (POST /db/_bulk_docs).
>>>>>
>>>>> --
>>>>> ,,,^..^,,,
>>>>>
>>>>
>>>>
>>>>
>

Re: Update handlers

Posted by Kiril Stankov <ki...@open-net.biz>.
Yes, it really does...
I am using nano.
Must be problem there.
Thanks!

К.



On 1/29/2015 8:08 PM, Alexander Shorin wrote:
> Works for me for following request:
>
> curl -XPOST 'http://localhost:5984/a/_design/test/_update/test/test?field=foo&value=bar'
> --
> ,,,^..^,,,
>
>
> On Thu, Jan 29, 2015 at 8:51 PM, Kiril Stankov <ki...@open-net.biz> wrote:
>> Hi again,
>>
>> I'm still trying to use the Update Handlers, but I'm running into problem:
>>
>> |||function(doc,req) {var field=req.query.field; var value =
>> req.query.value; doc[field]=value; return [doc, 'ok'];}
>>
>> |||
>>
>> The function executes, returns 'ok', the revision of the doc is increased by
>> 2, but the fields are not changed.
>> The field I want to update is present in the doc.
>> Any idea what am I doing wrong?
>>
>> Thanks in advance.
>> ------------------------------------------------------------------------
>> *With best regards,*
>> Kiril Stankov,
>>
>> On 1/29/2015 2:52 PM, Kiril Stankov wrote:
>>> Thanks all!
>>> ------------------------------------------------------------------------
>>> *With best regards,*
>>> Kiril Stankov,
>>> CEO
>>>
>>>
>>>             This Email disclaimer
>>>             <http://open-net.biz/emailsignature.html> is integral part
>>>             of this message.
>>>
>>> On 1/29/2015 11:57 AM, Alexander Shorin wrote:
>>>> On Thu, Jan 29, 2015 at 12:45 PM, Ingo Radatz <th...@googlemail.com>
>>>> wrote:
>>>>> With the help of a rewrite rule you could force an put to be handled by
>>>>> an update handler before the doc gets stored
>>>> Need to keep in mind that this rewrite rule could be easily bypassed
>>>> by bulk update (POST /db/_bulk_docs).
>>>>
>>>> --
>>>> ,,,^..^,,,
>>>
>>>


Re: Update handlers

Posted by Alexander Shorin <kx...@gmail.com>.
Works for me for following request:

curl -XPOST 'http://localhost:5984/a/_design/test/_update/test/test?field=foo&value=bar'
--
,,,^..^,,,


On Thu, Jan 29, 2015 at 8:51 PM, Kiril Stankov <ki...@open-net.biz> wrote:
> Hi again,
>
> I'm still trying to use the Update Handlers, but I'm running into problem:
>
> |||function(doc,req) {var field=req.query.field; var value =
> req.query.value; doc[field]=value; return [doc, 'ok'];}
>
> |||
>
> The function executes, returns 'ok', the revision of the doc is increased by
> 2, but the fields are not changed.
> The field I want to update is present in the doc.
> Any idea what am I doing wrong?
>
> Thanks in advance.
> ------------------------------------------------------------------------
> *With best regards,*
> Kiril Stankov,
>
> On 1/29/2015 2:52 PM, Kiril Stankov wrote:
>>
>> Thanks all!
>> ------------------------------------------------------------------------
>> *With best regards,*
>> Kiril Stankov,
>> CEO
>>
>>
>>            This Email disclaimer
>>            <http://open-net.biz/emailsignature.html> is integral part
>>            of this message.
>>
>> On 1/29/2015 11:57 AM, Alexander Shorin wrote:
>>>
>>> On Thu, Jan 29, 2015 at 12:45 PM, Ingo Radatz <th...@googlemail.com>
>>> wrote:
>>>>
>>>> With the help of a rewrite rule you could force an put to be handled by
>>>> an update handler before the doc gets stored
>>>
>>> Need to keep in mind that this rewrite rule could be easily bypassed
>>> by bulk update (POST /db/_bulk_docs).
>>>
>>> --
>>> ,,,^..^,,,
>>
>>
>>
>

Re: Update handlers

Posted by Kiril Stankov <ki...@open-net.biz>.
Hi again,

I'm still trying to use the Update Handlers, but I'm running into problem:

|||function(doc,req) {var field=req.query.field; var value = req.query.value; doc[field]=value; return [doc, 'ok'];}

|||

The function executes, returns 'ok', the revision of the doc is 
increased by 2, but the fields are not changed.
The field I want to update is present in the doc.
Any idea what am I doing wrong?

Thanks in advance.
------------------------------------------------------------------------
*With best regards,*
Kiril Stankov,

On 1/29/2015 2:52 PM, Kiril Stankov wrote:
> Thanks all!
> ------------------------------------------------------------------------
> *With best regards,*
> Kiril Stankov,
> CEO
>
>
>            This Email disclaimer
>            <http://open-net.biz/emailsignature.html> is integral part
>            of this message.
>
> On 1/29/2015 11:57 AM, Alexander Shorin wrote:
>> On Thu, Jan 29, 2015 at 12:45 PM, Ingo Radatz 
>> <th...@googlemail.com> wrote:
>>> With the help of a rewrite rule you could force an put to be handled 
>>> by an update handler before the doc gets stored
>> Need to keep in mind that this rewrite rule could be easily bypassed
>> by bulk update (POST /db/_bulk_docs).
>>
>> -- 
>> ,,,^..^,,,
>
>


Re: Update handlers

Posted by Kiril Stankov <ki...@open-net.biz>.
Thanks all!
------------------------------------------------------------------------
*With best regards,*
Kiril Stankov,
CEO


            This Email disclaimer
            <http://open-net.biz/emailsignature.html> is integral part
            of this message.

On 1/29/2015 11:57 AM, Alexander Shorin wrote:
> On Thu, Jan 29, 2015 at 12:45 PM, Ingo Radatz <th...@googlemail.com> wrote:
>> With the help of a rewrite rule you could force an put to be handled by an update handler before the doc gets stored
> Need to keep in mind that this rewrite rule could be easily bypassed
> by bulk update (POST /db/_bulk_docs).
>
> --
> ,,,^..^,,,


Re: Update handlers

Posted by Alexander Shorin <kx...@gmail.com>.
On Thu, Jan 29, 2015 at 12:45 PM, Ingo Radatz <th...@googlemail.com> wrote:
> With the help of a rewrite rule you could force an put to be handled by an update handler before the doc gets stored

Need to keep in mind that this rewrite rule could be easily bypassed
by bulk update (POST /db/_bulk_docs).

--
,,,^..^,,,

Re: Update handlers

Posted by Ingo Radatz <th...@googlemail.com>.
With the help of a rewrite rule you could force an put to be handled by an update handler before the doc gets stored:

{
  "from": "/:docid",
  "to":  "_update/addTimestamp/:docid",
  "method": "PUT"
}

Your root URI must point to the /_rewrite handler of the design doc (a vhost setting can simplify/ensure that).

Re: Update handlers

Posted by Alexander Shorin <kx...@gmail.com>.
Yes, update handlers need to be requested explicitly.
No, CouchDB doesn't provides any method to change your documents
automagically on PUT. It's not oblivious how this feature should work
in case of replication.
Normally, you don't need any server side. All you need is to ensure
that the clients are send you valid documents (e.g. setup
validate_doc_update functions). Will they call update handlers or will
they implement update handle logic on their side is not a question you
should concern. Valid data is.
--
,,,^..^,,,


On Thu, Jan 29, 2015 at 12:12 PM, Kiril Stankov <ki...@open-net.biz> wrote:
> Hi,
>
> I'm new to CouchDB and I have a question about updates.
> It seems that I misunderstood how they work.
> I assumed that they are automatically called upon submitting or modifying a
> document to the db.
> For example, I add a document and I want a field 'createddate' to be
> automatically filled with current time stamp.
>
> Now I realize that to make an update run, I need to call it's URL.
>
> Is there a way to achieve what I need or I should add the logic to my server
> side code?
>
> Thanks in advance!
> ------------------------------------------------------------------------
> *With best regards,*
> Kiril Stankov,
>

Update handlers

Posted by Kiril Stankov <ki...@open-net.biz>.
Hi,

I'm new to CouchDB and I have a question about updates.
It seems that I misunderstood how they work.
I assumed that they are automatically called upon submitting or 
modifying a document to the db.
For example, I add a document and I want a field 'createddate' to be 
automatically filled with current time stamp.

Now I realize that to make an update run, I need to call it's URL.

Is there a way to achieve what I need or I should add the logic to my 
server side code?

Thanks in advance!
------------------------------------------------------------------------
*With best regards,*
Kiril Stankov,


Re: Why am i seeing this behavior?

Posted by Stanley Iriele <si...@gmail.com>.
Hey TAE,

I think you have a misunderstanding about how views work in couchdb in
general. So let me start at a very high level.
When you create an index in SQL what basically happens is the database runs
a CREATE INDEX function and that index data structure is updated, whether
its a b-tree or whatever, every time you write a record that affects it.
The thing to remember is that in a SQL relational database you cannot edit
the function that populates the underlying data structure.

Couchdb's Views are basically disk indexes, but there are a few differences
worth noting.

1)Couchdb exposes the actual function that gets run on a per doc basis that
populates the view, instead of calling executing some arbitrary internal
function.

2) Couchdb does not update a index/view on write like SQL databases do
does. It instead updates each view every time you request for a read. Each
db in couchdb maintains a seq number thats basically the number of writes
in the db, not totally true but moving on. Each view knows what sequence
number it has built up to. If the seq number of the db and the view do not
match it looks to see what documents have changed since then by examining
the changes feed. It then sees "oh hey there have been 29 writes..on 2
documents..hmm...time to call TAE's map(doc) function on these 2 documents
and fix the K-D tree index".

When you open couchdb and "run the view manually" you're basically blowing
away the index and calling CREATE INDEX again on your DB

 I see that you are cloning the documents for some reason. what are you
trying to accomplish exactly?

On Wed, Jan 28, 2015 at 5:04 PM, TAE JIN KIM <sn...@hotmail.com> wrote:

> I see...but is there any way I can optionally force it somehow so the view
> always get executed whenever it is invoked by the REST api? (Let's suppose
> list function is not my option)
>
>
>
> > Date: Thu, 29 Jan 2015 04:45:49 +0400
> > Subject: Re: Why am i seeing this behavior?
> > From: kxepal@gmail.com
> > To: user@couchdb.apache.org
> >
> > View function get executed once on view resource request for only
> > updated documents.
> > Once document get indexed, the result is stored in view index file and
> > read from there since that moment.
> > If you need request-depended view results use list functions instead.
> > --
> > ,,,^..^,,,
> >
> >
> > On Thu, Jan 29, 2015 at 3:33 AM, TAE JIN KIM <sn...@hotmail.com>
> wrote:
> > > Hello,
> > >
> > > Here is my view.
> > >
> > > function(doc) {
> > >       var cloned = eval(uneval(doc));
> > >       cloned.status = myfunction(cloned.createddate);
> > >   …
> > >  ..
> > > }
> > >
> > > function myfunction(cd) {
> > >     var currentDate = new Date();
> > >     ……..currentDaate - cd… // calculate date difference…
> > >
> > > }
> > >
> > > bascially, myfunction returns some status based on date difference...
> > > I am consuming this one via rest API, ..and this works fine..but..the
> thing is that...it looks CouchDB doesn't always run this whenever this is
> invoked by API?..somehow return cached version??
> > >
> > > I meant...let's say here is some expected table...
> > >
> > > Here are expected Values
> > >
> > > 3 Days ago - 'Status A'
> > > 2 Days ago - 'Status B'
> > > yesterday  - 'Statuc C'
> > > today - 'Status D'
> > >
> > > So I am expecting to see the status value, which is 'Status D' today,
> but it still shows 'Status B' for some reason via Rest API.  I have to open
> couch db web admin interface to run this view manually in order to see
> correct status value, which is 'Status D'.
> > >
> > > Why is that?...
> > > Any idea?
> > >
> > > Thanks,
> > >
> > >
>
>

RE: Why am i seeing this behavior?

Posted by TAE JIN KIM <sn...@hotmail.com>.
I see...but is there any way I can optionally force it somehow so the view always get executed whenever it is invoked by the REST api? (Let's suppose list function is not my option)



> Date: Thu, 29 Jan 2015 04:45:49 +0400
> Subject: Re: Why am i seeing this behavior?
> From: kxepal@gmail.com
> To: user@couchdb.apache.org
> 
> View function get executed once on view resource request for only
> updated documents.
> Once document get indexed, the result is stored in view index file and
> read from there since that moment.
> If you need request-depended view results use list functions instead.
> --
> ,,,^..^,,,
> 
> 
> On Thu, Jan 29, 2015 at 3:33 AM, TAE JIN KIM <sn...@hotmail.com> wrote:
> > Hello,
> >
> > Here is my view.
> >
> > function(doc) {
> >       var cloned = eval(uneval(doc));
> >       cloned.status = myfunction(cloned.createddate);
> >   …
> >  ..
> > }
> >
> > function myfunction(cd) {
> >     var currentDate = new Date();
> >     ……..currentDaate - cd… // calculate date difference…
> >
> > }
> >
> > bascially, myfunction returns some status based on date difference...
> > I am consuming this one via rest API, ..and this works fine..but..the thing is that...it looks CouchDB doesn't always run this whenever this is invoked by API?..somehow return cached version??
> >
> > I meant...let's say here is some expected table...
> >
> > Here are expected Values
> >
> > 3 Days ago - 'Status A'
> > 2 Days ago - 'Status B'
> > yesterday  - 'Statuc C'
> > today - 'Status D'
> >
> > So I am expecting to see the status value, which is 'Status D' today, but it still shows 'Status B' for some reason via Rest API.  I have to open couch db web admin interface to run this view manually in order to see correct status value, which is 'Status D'.
> >
> > Why is that?...
> > Any idea?
> >
> > Thanks,
> >
> >
 		 	   		  

Re: Why am i seeing this behavior?

Posted by Alexander Shorin <kx...@gmail.com>.
View function get executed once on view resource request for only
updated documents.
Once document get indexed, the result is stored in view index file and
read from there since that moment.
If you need request-depended view results use list functions instead.
--
,,,^..^,,,


On Thu, Jan 29, 2015 at 3:33 AM, TAE JIN KIM <sn...@hotmail.com> wrote:
> Hello,
>
> Here is my view.
>
> function(doc) {
>       var cloned = eval(uneval(doc));
>       cloned.status = myfunction(cloned.createddate);
>   …
>  ..
> }
>
> function myfunction(cd) {
>     var currentDate = new Date();
>     ……..currentDaate - cd… // calculate date difference…
>
> }
>
> bascially, myfunction returns some status based on date difference...
> I am consuming this one via rest API, ..and this works fine..but..the thing is that...it looks CouchDB doesn't always run this whenever this is invoked by API?..somehow return cached version??
>
> I meant...let's say here is some expected table...
>
> Here are expected Values
>
> 3 Days ago - 'Status A'
> 2 Days ago - 'Status B'
> yesterday  - 'Statuc C'
> today - 'Status D'
>
> So I am expecting to see the status value, which is 'Status D' today, but it still shows 'Status B' for some reason via Rest API.  I have to open couch db web admin interface to run this view manually in order to see correct status value, which is 'Status D'.
>
> Why is that?...
> Any idea?
>
> Thanks,
>
>

Why am i seeing this behavior?

Posted by TAE JIN KIM <sn...@hotmail.com>.
Hello,

Here is my view.

function(doc) {
      var cloned = eval(uneval(doc));
      cloned.status = myfunction(cloned.createddate);
  …
 ..
}

function myfunction(cd) {
    var currentDate = new Date();
    ……..currentDaate - cd… // calculate date difference…

}

bascially, myfunction returns some status based on date difference...
I am consuming this one via rest API, ..and this works fine..but..the thing is that...it looks CouchDB doesn't always run this whenever this is invoked by API?..somehow return cached version??

I meant...let's say here is some expected table...

Here are expected Values

3 Days ago - 'Status A'
2 Days ago - 'Status B'
yesterday  - 'Statuc C'
today - 'Status D'

So I am expecting to see the status value, which is 'Status D' today, but it still shows 'Status B' for some reason via Rest API.  I have to open couch db web admin interface to run this view manually in order to see correct status value, which is 'Status D'.

Why is that?...
Any idea?

Thanks,

 		 	   		  

Re: couchdb database size

Posted by Sharath <sh...@gmail.com>.
Hi Alexander,

Thanks for your pointers.

-Sharath

On Tue, Jan 27, 2015 at 10:58 PM, Alexander Shorin <kx...@gmail.com> wrote:

> Hi Sharath,
>
> Glad that this tip helped.
>
> On Tue, Jan 27, 2015 at 11:53 AM, Sharath <sh...@gmail.com> wrote:
> > Q: how does one determine the optimum checkpoint_after and
> doc_buffer_size?
> > If this is dependent on the document size, then this value should be
> > configurable per database.
>
> Personally, I don't know the answer here. It worth to play around with
> and find the formula or inspect file format deeply to figure out what
> causes uncompactable wasted space in it. But indeed blindly increasing
> buffer sized by factor of 10 multiple times isn't a good way to go.
>
> --
> ,,,^..^,,,
>

Re: couchdb database size

Posted by Alexander Shorin <kx...@gmail.com>.
Hi Sharath,

Glad that this tip helped.

On Tue, Jan 27, 2015 at 11:53 AM, Sharath <sh...@gmail.com> wrote:
> Q: how does one determine the optimum checkpoint_after and doc_buffer_size?
> If this is dependent on the document size, then this value should be
> configurable per database.

Personally, I don't know the answer here. It worth to play around with
and find the formula or inspect file format deeply to figure out what
causes uncompactable wasted space in it. But indeed blindly increasing
buffer sized by factor of 10 multiple times isn't a good way to go.

--
,,,^..^,,,

Re: couchdb database size

Posted by Sharath <sh...@gmail.com>.
With the following settings, the size came down to 14GB

checkpoint_after = 524288000
doc_buffer_size = 52428800

{
    "db_name": "database2",
    "doc_count": 12986513,
    "doc_del_count": 0,
    "update_seq": 12986513,
    "purge_seq": 0,
    "compact_running": false,
    "disk_size": 15156363386,
    "data_size": 8034864363,
    "instance_start_time": "1422213492581804",
    "disk_format_version": 6,
    "committed_update_seq": 12986513
}

ratio disksize/datasize = 1.88

When I further increase the setting by a factor of 10, the size reduces to
8.9G
checkpoint_after = 5242880000
doc_buffer_size = 524288000
{
    "db_name": "database2",
    "doc_count": 12986513,
    "doc_del_count": 0,
    "update_seq": 12986513,
    "purge_seq": 0,
    "compact_running": false,
    "disk_size": 9518403706,
    "data_size": 8027906267,
    "instance_start_time": "1422302560319114",
    "disk_format_version": 6,
    "committed_update_seq": 12986513
}

ratio disksize/datasize = 1.18

My other database got even better with a ratio of 1.00052

{
    "db_name": "database1",
    "doc_count": 13337224,
    "doc_del_count": 0,
    "update_seq": 13337224,
    "purge_seq": 0,
    "compact_running": false,
    "disk_size": 6897811578,
    "data_size": 6894215476,
    "instance_start_time": "1422302561058460",
    "disk_format_version": 6,
    "committed_update_seq": 13337224
}

Q: how does one determine the optimum checkpoint_after and doc_buffer_size?
If this is dependent on the document size, then this value should be
configurable per database.

-Sharath

On Mon, Jan 26, 2015 at 5:31 AM, Sharath <sh...@gmail.com> wrote:

> Thanks - I've set the following values:
> checkpoint_after = 524288000
> doc_buffer_size = 52428800
>
> and started the compact process. Have to wait for a bit.
>
> -Sharath
>
> On Sun, Jan 25, 2015 at 5:59 PM, Alexander Shorin <kx...@gmail.com>
> wrote:
>
>> Ok, so far, this looks exactly what I have for my hashes databases:
>>
>> data_size: 557537537
>> disk_size: 1542664311
>> doc_count: 1298255
>> doc_del_count: 18
>> avg doc size: ~350 bytes
>>
>> While there is 3 times disk_size/data_size ratio, this database
>> uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it
>> at 1.5GB. This looks like a some "specifics" of underlying database
>> format which isn't able to rationale allocate huge amount of tiny
>> documents....But, CouchDB provides two interesting options to
>> configure database compaction: doc_buffer_size and checkpoint_after.
>>
>> http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction
>>
>> By default they are have the following values:
>>
>> checkpoint_after = 5242880
>> doc_buffer_size = 524288
>>
>> And this makes my hashes database to stop at 1.5GB point. If I
>> multiple them both by 10, after compaction database size will be
>> ~900MB - yay! If I do this again with the resulting config:
>>
>> checkpoint_after = 524288000
>> doc_buffer_size = 52428800
>>
>> Then database sizes will be much more better:
>>
>> disk_size: 633688183
>> data_size: 556759808
>>
>> Almost no overhead! Why this happens? Paul or Robert may correct me,
>> but it seems that the most of wasted space after compaction is
>> consumed by checkpoint headers and btree rebalance. Asking CouchDB to
>> make compaction checkpoints rarely and use bigger buffer for docs
>> allows it to build the resulting btree in the new database file in
>> more optimized way. As the downsize of such configuration, if your
>> compaction fails, it have to start from far and bigger buffer size
>> requires more memory to use.
>>
>> Try to play with these options and see how they will affect on your
>> databases.
>>
>> P.S. This issue is eventually solved for upcoming 2.0 with default config.
>> --
>> ,,,^..^,,,
>>
>>
>> On Sun, Jan 25, 2015 at 9:52 AM, Sharath <sh...@gmail.com> wrote:
>> > yes the databases were recently compacted - both the databases run as
>> > insert only (no deletion for either).
>> > database2 completed compaction about 4 hours ago and I've triggered
>> > compaction again (so what you see below for database2 could be
>> misleading)
>> >
>> > database1:
>> > {
>> >    "db_name":"database1",
>> >    "doc_count":13337224,
>> >    "doc_del_count":0,
>> >    "update_seq":13337224,
>> >    "purge_seq":0,
>> >    "compact_running":false,
>> >    "disk_size":8574615674,
>> >    "data_size":6896805847,
>> >    "instance_start_time":"1422157234994080",
>> >    "disk_format_version":6,
>> >    "committed_update_seq":13337224
>> > }
>> >
>> > database2:
>> > {
>> >    "db_name":"database2",
>> >    "doc_count":12982621,
>> >    "doc_del_count":0,
>> >    "update_seq":12982621,
>> >    "purge_seq":0,
>> >    "compact_running":true,
>> >    "disk_size":31587352698,
>> >    "data_size":8026729752,
>> >    "instance_start_time":"1422157235289671",
>> >    "disk_format_version":6,
>> >    "committed_update_seq":12982621
>> > }
>> >
>> > -Sharath
>> >
>> > On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <kx...@gmail.com>
>> wrote:
>> >
>> >> Hm...are you sure that database was recently compacted? How many
>> >> deleted documents in these databases?
>> >> --
>> >> ,,,^..^,,,
>> >>
>> >>
>> >> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <sh...@gmail.com> wrote:
>> >> > Hi Alexander,
>> >> >
>> >> > CouchDB version: 1.61
>> >> >
>> >> > database1: "disk_size":8574615674,"data_size":6896805847
>> >> > database2: "disk_size":31587352698,"data_size":8026729752
>> >> >
>> >> > -Sharath
>> >> >
>> >> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <kx...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Hi Sharath,
>> >> >>
>> >> >> What is your CouchDB version?
>> >> >> Could you provide data_size and disk_size values from database info
>> for
>> >> >> both?
>> >> >> curl http://localhost:5984/db1
>> >> >> curl http://localhost:5984/db2
>> >> >> --
>> >> >> ,,,^..^,,,
>> >> >>
>> >> >>
>> >> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sh...@gmail.com>
>> wrote:
>> >> >> > Hi All,
>> >> >> >
>> >> >> > recently moved to couchdb and find my databases taking a lot of
>> >> diskspace
>> >> >> >
>> >> >> > I have two database both with json documents (no attachments) -
>> >> however
>> >> >> the
>> >> >> > sizes vary by a lot
>> >> >> >
>> >> >> > database1      size 8.0GB    number of documents: 13337224
>> >> >> > database2      size 29.4 GB    number of documents: 12981148
>> >> >> >
>> >> >> > both the databases have been compacted
>> >> >> >
>> >> >> > each document in database1 is 487 bytes long (including _id and
>> _rev)
>> >> >> > each document in database2 is 564 bytes long (including _id and
>> _rev)
>> >> >> >
>> >> >> > database1 should be ~6.1GB (only data without compression) [487 *
>> >> >> 13337224
>> >> >> > / 1024 /1024]
>> >> >> > database2 should be ~6.9GB (only data without compression) [564 *
>> >> >> 12981148
>> >> >> > / 1024 /1024]
>> >> >> >
>> >> >> > I'm curious why the database file takes 29 GB.
>> >> >> >
>> >> >> > unfortunately I cannot post the document as this is prod data.
>> >> >> >
>> >> >> > CouchDb is running on my mac 10.10.1 with default configuration.
>> >> >> >
>> >> >> > database1 was populated by a bulk upload from a mysql extract and
>> >> >> database
>> >> >> > 2 was populated by individual document inserts (put) database
>> >> compaction
>> >> >> > was let to complete (took ~30hr on database 2)
>> >> >> >
>> >> >> > is there a command that compacts superfluous data? or am i missing
>> >> >> anything?
>> >> >> >
>> >> >> >
>> >> >> > thanks!
>> >> >> >
>> >> >> > -Sharath
>> >> >>
>> >>
>>
>
>

Re: couchdb database size

Posted by Sharath <sh...@gmail.com>.
Thanks - I've set the following values:
checkpoint_after = 524288000
doc_buffer_size = 52428800

and started the compact process. Have to wait for a bit.

-Sharath

On Sun, Jan 25, 2015 at 5:59 PM, Alexander Shorin <kx...@gmail.com> wrote:

> Ok, so far, this looks exactly what I have for my hashes databases:
>
> data_size: 557537537
> disk_size: 1542664311
> doc_count: 1298255
> doc_del_count: 18
> avg doc size: ~350 bytes
>
> While there is 3 times disk_size/data_size ratio, this database
> uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it
> at 1.5GB. This looks like a some "specifics" of underlying database
> format which isn't able to rationale allocate huge amount of tiny
> documents....But, CouchDB provides two interesting options to
> configure database compaction: doc_buffer_size and checkpoint_after.
>
> http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction
>
> By default they are have the following values:
>
> checkpoint_after = 5242880
> doc_buffer_size = 524288
>
> And this makes my hashes database to stop at 1.5GB point. If I
> multiple them both by 10, after compaction database size will be
> ~900MB - yay! If I do this again with the resulting config:
>
> checkpoint_after = 524288000
> doc_buffer_size = 52428800
>
> Then database sizes will be much more better:
>
> disk_size: 633688183
> data_size: 556759808
>
> Almost no overhead! Why this happens? Paul or Robert may correct me,
> but it seems that the most of wasted space after compaction is
> consumed by checkpoint headers and btree rebalance. Asking CouchDB to
> make compaction checkpoints rarely and use bigger buffer for docs
> allows it to build the resulting btree in the new database file in
> more optimized way. As the downsize of such configuration, if your
> compaction fails, it have to start from far and bigger buffer size
> requires more memory to use.
>
> Try to play with these options and see how they will affect on your
> databases.
>
> P.S. This issue is eventually solved for upcoming 2.0 with default config.
> --
> ,,,^..^,,,
>
>
> On Sun, Jan 25, 2015 at 9:52 AM, Sharath <sh...@gmail.com> wrote:
> > yes the databases were recently compacted - both the databases run as
> > insert only (no deletion for either).
> > database2 completed compaction about 4 hours ago and I've triggered
> > compaction again (so what you see below for database2 could be
> misleading)
> >
> > database1:
> > {
> >    "db_name":"database1",
> >    "doc_count":13337224,
> >    "doc_del_count":0,
> >    "update_seq":13337224,
> >    "purge_seq":0,
> >    "compact_running":false,
> >    "disk_size":8574615674,
> >    "data_size":6896805847,
> >    "instance_start_time":"1422157234994080",
> >    "disk_format_version":6,
> >    "committed_update_seq":13337224
> > }
> >
> > database2:
> > {
> >    "db_name":"database2",
> >    "doc_count":12982621,
> >    "doc_del_count":0,
> >    "update_seq":12982621,
> >    "purge_seq":0,
> >    "compact_running":true,
> >    "disk_size":31587352698,
> >    "data_size":8026729752,
> >    "instance_start_time":"1422157235289671",
> >    "disk_format_version":6,
> >    "committed_update_seq":12982621
> > }
> >
> > -Sharath
> >
> > On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <kx...@gmail.com>
> wrote:
> >
> >> Hm...are you sure that database was recently compacted? How many
> >> deleted documents in these databases?
> >> --
> >> ,,,^..^,,,
> >>
> >>
> >> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <sh...@gmail.com> wrote:
> >> > Hi Alexander,
> >> >
> >> > CouchDB version: 1.61
> >> >
> >> > database1: "disk_size":8574615674,"data_size":6896805847
> >> > database2: "disk_size":31587352698,"data_size":8026729752
> >> >
> >> > -Sharath
> >> >
> >> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <kx...@gmail.com>
> >> wrote:
> >> >
> >> >> Hi Sharath,
> >> >>
> >> >> What is your CouchDB version?
> >> >> Could you provide data_size and disk_size values from database info
> for
> >> >> both?
> >> >> curl http://localhost:5984/db1
> >> >> curl http://localhost:5984/db2
> >> >> --
> >> >> ,,,^..^,,,
> >> >>
> >> >>
> >> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sh...@gmail.com>
> wrote:
> >> >> > Hi All,
> >> >> >
> >> >> > recently moved to couchdb and find my databases taking a lot of
> >> diskspace
> >> >> >
> >> >> > I have two database both with json documents (no attachments) -
> >> however
> >> >> the
> >> >> > sizes vary by a lot
> >> >> >
> >> >> > database1      size 8.0GB    number of documents: 13337224
> >> >> > database2      size 29.4 GB    number of documents: 12981148
> >> >> >
> >> >> > both the databases have been compacted
> >> >> >
> >> >> > each document in database1 is 487 bytes long (including _id and
> _rev)
> >> >> > each document in database2 is 564 bytes long (including _id and
> _rev)
> >> >> >
> >> >> > database1 should be ~6.1GB (only data without compression) [487 *
> >> >> 13337224
> >> >> > / 1024 /1024]
> >> >> > database2 should be ~6.9GB (only data without compression) [564 *
> >> >> 12981148
> >> >> > / 1024 /1024]
> >> >> >
> >> >> > I'm curious why the database file takes 29 GB.
> >> >> >
> >> >> > unfortunately I cannot post the document as this is prod data.
> >> >> >
> >> >> > CouchDb is running on my mac 10.10.1 with default configuration.
> >> >> >
> >> >> > database1 was populated by a bulk upload from a mysql extract and
> >> >> database
> >> >> > 2 was populated by individual document inserts (put) database
> >> compaction
> >> >> > was let to complete (took ~30hr on database 2)
> >> >> >
> >> >> > is there a command that compacts superfluous data? or am i missing
> >> >> anything?
> >> >> >
> >> >> >
> >> >> > thanks!
> >> >> >
> >> >> > -Sharath
> >> >>
> >>
>

Re: couchdb database size

Posted by Alexander Shorin <kx...@gmail.com>.
Ok, so far, this looks exactly what I have for my hashes databases:

data_size: 557537537
disk_size: 1542664311
doc_count: 1298255
doc_del_count: 18
avg doc size: ~350 bytes

While there is 3 times disk_size/data_size ratio, this database
uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it
at 1.5GB. This looks like a some "specifics" of underlying database
format which isn't able to rationale allocate huge amount of tiny
documents....But, CouchDB provides two interesting options to
configure database compaction: doc_buffer_size and checkpoint_after.
http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction

By default they are have the following values:

checkpoint_after = 5242880
doc_buffer_size = 524288

And this makes my hashes database to stop at 1.5GB point. If I
multiple them both by 10, after compaction database size will be
~900MB - yay! If I do this again with the resulting config:

checkpoint_after = 524288000
doc_buffer_size = 52428800

Then database sizes will be much more better:

disk_size: 633688183
data_size: 556759808

Almost no overhead! Why this happens? Paul or Robert may correct me,
but it seems that the most of wasted space after compaction is
consumed by checkpoint headers and btree rebalance. Asking CouchDB to
make compaction checkpoints rarely and use bigger buffer for docs
allows it to build the resulting btree in the new database file in
more optimized way. As the downsize of such configuration, if your
compaction fails, it have to start from far and bigger buffer size
requires more memory to use.

Try to play with these options and see how they will affect on your databases.

P.S. This issue is eventually solved for upcoming 2.0 with default config.
--
,,,^..^,,,


On Sun, Jan 25, 2015 at 9:52 AM, Sharath <sh...@gmail.com> wrote:
> yes the databases were recently compacted - both the databases run as
> insert only (no deletion for either).
> database2 completed compaction about 4 hours ago and I've triggered
> compaction again (so what you see below for database2 could be misleading)
>
> database1:
> {
>    "db_name":"database1",
>    "doc_count":13337224,
>    "doc_del_count":0,
>    "update_seq":13337224,
>    "purge_seq":0,
>    "compact_running":false,
>    "disk_size":8574615674,
>    "data_size":6896805847,
>    "instance_start_time":"1422157234994080",
>    "disk_format_version":6,
>    "committed_update_seq":13337224
> }
>
> database2:
> {
>    "db_name":"database2",
>    "doc_count":12982621,
>    "doc_del_count":0,
>    "update_seq":12982621,
>    "purge_seq":0,
>    "compact_running":true,
>    "disk_size":31587352698,
>    "data_size":8026729752,
>    "instance_start_time":"1422157235289671",
>    "disk_format_version":6,
>    "committed_update_seq":12982621
> }
>
> -Sharath
>
> On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <kx...@gmail.com> wrote:
>
>> Hm...are you sure that database was recently compacted? How many
>> deleted documents in these databases?
>> --
>> ,,,^..^,,,
>>
>>
>> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <sh...@gmail.com> wrote:
>> > Hi Alexander,
>> >
>> > CouchDB version: 1.61
>> >
>> > database1: "disk_size":8574615674,"data_size":6896805847
>> > database2: "disk_size":31587352698,"data_size":8026729752
>> >
>> > -Sharath
>> >
>> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <kx...@gmail.com>
>> wrote:
>> >
>> >> Hi Sharath,
>> >>
>> >> What is your CouchDB version?
>> >> Could you provide data_size and disk_size values from database info for
>> >> both?
>> >> curl http://localhost:5984/db1
>> >> curl http://localhost:5984/db2
>> >> --
>> >> ,,,^..^,,,
>> >>
>> >>
>> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sh...@gmail.com> wrote:
>> >> > Hi All,
>> >> >
>> >> > recently moved to couchdb and find my databases taking a lot of
>> diskspace
>> >> >
>> >> > I have two database both with json documents (no attachments) -
>> however
>> >> the
>> >> > sizes vary by a lot
>> >> >
>> >> > database1      size 8.0GB    number of documents: 13337224
>> >> > database2      size 29.4 GB    number of documents: 12981148
>> >> >
>> >> > both the databases have been compacted
>> >> >
>> >> > each document in database1 is 487 bytes long (including _id and _rev)
>> >> > each document in database2 is 564 bytes long (including _id and _rev)
>> >> >
>> >> > database1 should be ~6.1GB (only data without compression) [487 *
>> >> 13337224
>> >> > / 1024 /1024]
>> >> > database2 should be ~6.9GB (only data without compression) [564 *
>> >> 12981148
>> >> > / 1024 /1024]
>> >> >
>> >> > I'm curious why the database file takes 29 GB.
>> >> >
>> >> > unfortunately I cannot post the document as this is prod data.
>> >> >
>> >> > CouchDb is running on my mac 10.10.1 with default configuration.
>> >> >
>> >> > database1 was populated by a bulk upload from a mysql extract and
>> >> database
>> >> > 2 was populated by individual document inserts (put) database
>> compaction
>> >> > was let to complete (took ~30hr on database 2)
>> >> >
>> >> > is there a command that compacts superfluous data? or am i missing
>> >> anything?
>> >> >
>> >> >
>> >> > thanks!
>> >> >
>> >> > -Sharath
>> >>
>>

Re: couchdb database size

Posted by Sharath <sh...@gmail.com>.
yes the databases were recently compacted - both the databases run as
insert only (no deletion for either).
database2 completed compaction about 4 hours ago and I've triggered
compaction again (so what you see below for database2 could be misleading)

database1:
{
   "db_name":"database1",
   "doc_count":13337224,
   "doc_del_count":0,
   "update_seq":13337224,
   "purge_seq":0,
   "compact_running":false,
   "disk_size":8574615674,
   "data_size":6896805847,
   "instance_start_time":"1422157234994080",
   "disk_format_version":6,
   "committed_update_seq":13337224
}

database2:
{
   "db_name":"database2",
   "doc_count":12982621,
   "doc_del_count":0,
   "update_seq":12982621,
   "purge_seq":0,
   "compact_running":true,
   "disk_size":31587352698,
   "data_size":8026729752,
   "instance_start_time":"1422157235289671",
   "disk_format_version":6,
   "committed_update_seq":12982621
}

-Sharath

On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <kx...@gmail.com> wrote:

> Hm...are you sure that database was recently compacted? How many
> deleted documents in these databases?
> --
> ,,,^..^,,,
>
>
> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <sh...@gmail.com> wrote:
> > Hi Alexander,
> >
> > CouchDB version: 1.61
> >
> > database1: "disk_size":8574615674,"data_size":6896805847
> > database2: "disk_size":31587352698,"data_size":8026729752
> >
> > -Sharath
> >
> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <kx...@gmail.com>
> wrote:
> >
> >> Hi Sharath,
> >>
> >> What is your CouchDB version?
> >> Could you provide data_size and disk_size values from database info for
> >> both?
> >> curl http://localhost:5984/db1
> >> curl http://localhost:5984/db2
> >> --
> >> ,,,^..^,,,
> >>
> >>
> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sh...@gmail.com> wrote:
> >> > Hi All,
> >> >
> >> > recently moved to couchdb and find my databases taking a lot of
> diskspace
> >> >
> >> > I have two database both with json documents (no attachments) -
> however
> >> the
> >> > sizes vary by a lot
> >> >
> >> > database1      size 8.0GB    number of documents: 13337224
> >> > database2      size 29.4 GB    number of documents: 12981148
> >> >
> >> > both the databases have been compacted
> >> >
> >> > each document in database1 is 487 bytes long (including _id and _rev)
> >> > each document in database2 is 564 bytes long (including _id and _rev)
> >> >
> >> > database1 should be ~6.1GB (only data without compression) [487 *
> >> 13337224
> >> > / 1024 /1024]
> >> > database2 should be ~6.9GB (only data without compression) [564 *
> >> 12981148
> >> > / 1024 /1024]
> >> >
> >> > I'm curious why the database file takes 29 GB.
> >> >
> >> > unfortunately I cannot post the document as this is prod data.
> >> >
> >> > CouchDb is running on my mac 10.10.1 with default configuration.
> >> >
> >> > database1 was populated by a bulk upload from a mysql extract and
> >> database
> >> > 2 was populated by individual document inserts (put) database
> compaction
> >> > was let to complete (took ~30hr on database 2)
> >> >
> >> > is there a command that compacts superfluous data? or am i missing
> >> anything?
> >> >
> >> >
> >> > thanks!
> >> >
> >> > -Sharath
> >>
>

Re: couchdb database size

Posted by Alexander Shorin <kx...@gmail.com>.
Hm...are you sure that database was recently compacted? How many
deleted documents in these databases?
--
,,,^..^,,,


On Sun, Jan 25, 2015 at 9:27 AM, Sharath <sh...@gmail.com> wrote:
> Hi Alexander,
>
> CouchDB version: 1.61
>
> database1: "disk_size":8574615674,"data_size":6896805847
> database2: "disk_size":31587352698,"data_size":8026729752
>
> -Sharath
>
> On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <kx...@gmail.com> wrote:
>
>> Hi Sharath,
>>
>> What is your CouchDB version?
>> Could you provide data_size and disk_size values from database info for
>> both?
>> curl http://localhost:5984/db1
>> curl http://localhost:5984/db2
>> --
>> ,,,^..^,,,
>>
>>
>> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sh...@gmail.com> wrote:
>> > Hi All,
>> >
>> > recently moved to couchdb and find my databases taking a lot of diskspace
>> >
>> > I have two database both with json documents (no attachments) - however
>> the
>> > sizes vary by a lot
>> >
>> > database1      size 8.0GB    number of documents: 13337224
>> > database2      size 29.4 GB    number of documents: 12981148
>> >
>> > both the databases have been compacted
>> >
>> > each document in database1 is 487 bytes long (including _id and _rev)
>> > each document in database2 is 564 bytes long (including _id and _rev)
>> >
>> > database1 should be ~6.1GB (only data without compression) [487 *
>> 13337224
>> > / 1024 /1024]
>> > database2 should be ~6.9GB (only data without compression) [564 *
>> 12981148
>> > / 1024 /1024]
>> >
>> > I'm curious why the database file takes 29 GB.
>> >
>> > unfortunately I cannot post the document as this is prod data.
>> >
>> > CouchDb is running on my mac 10.10.1 with default configuration.
>> >
>> > database1 was populated by a bulk upload from a mysql extract and
>> database
>> > 2 was populated by individual document inserts (put) database compaction
>> > was let to complete (took ~30hr on database 2)
>> >
>> > is there a command that compacts superfluous data? or am i missing
>> anything?
>> >
>> >
>> > thanks!
>> >
>> > -Sharath
>>

Re: couchdb database size

Posted by Sharath <sh...@gmail.com>.
Hi Alexander,

CouchDB version: 1.61

database1: "disk_size":8574615674,"data_size":6896805847
database2: "disk_size":31587352698,"data_size":8026729752

-Sharath

On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <kx...@gmail.com> wrote:

> Hi Sharath,
>
> What is your CouchDB version?
> Could you provide data_size and disk_size values from database info for
> both?
> curl http://localhost:5984/db1
> curl http://localhost:5984/db2
> --
> ,,,^..^,,,
>
>
> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sh...@gmail.com> wrote:
> > Hi All,
> >
> > recently moved to couchdb and find my databases taking a lot of diskspace
> >
> > I have two database both with json documents (no attachments) - however
> the
> > sizes vary by a lot
> >
> > database1      size 8.0GB    number of documents: 13337224
> > database2      size 29.4 GB    number of documents: 12981148
> >
> > both the databases have been compacted
> >
> > each document in database1 is 487 bytes long (including _id and _rev)
> > each document in database2 is 564 bytes long (including _id and _rev)
> >
> > database1 should be ~6.1GB (only data without compression) [487 *
> 13337224
> > / 1024 /1024]
> > database2 should be ~6.9GB (only data without compression) [564 *
> 12981148
> > / 1024 /1024]
> >
> > I'm curious why the database file takes 29 GB.
> >
> > unfortunately I cannot post the document as this is prod data.
> >
> > CouchDb is running on my mac 10.10.1 with default configuration.
> >
> > database1 was populated by a bulk upload from a mysql extract and
> database
> > 2 was populated by individual document inserts (put) database compaction
> > was let to complete (took ~30hr on database 2)
> >
> > is there a command that compacts superfluous data? or am i missing
> anything?
> >
> >
> > thanks!
> >
> > -Sharath
>

Re: couchdb database size

Posted by Alexander Shorin <kx...@gmail.com>.
Hi Sharath,

What is your CouchDB version?
Could you provide data_size and disk_size values from database info for both?
curl http://localhost:5984/db1
curl http://localhost:5984/db2
--
,,,^..^,,,


On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sh...@gmail.com> wrote:
> Hi All,
>
> recently moved to couchdb and find my databases taking a lot of diskspace
>
> I have two database both with json documents (no attachments) - however the
> sizes vary by a lot
>
> database1      size 8.0GB    number of documents: 13337224
> database2      size 29.4 GB    number of documents: 12981148
>
> both the databases have been compacted
>
> each document in database1 is 487 bytes long (including _id and _rev)
> each document in database2 is 564 bytes long (including _id and _rev)
>
> database1 should be ~6.1GB (only data without compression) [487 * 13337224
> / 1024 /1024]
> database2 should be ~6.9GB (only data without compression) [564 * 12981148
> / 1024 /1024]
>
> I'm curious why the database file takes 29 GB.
>
> unfortunately I cannot post the document as this is prod data.
>
> CouchDb is running on my mac 10.10.1 with default configuration.
>
> database1 was populated by a bulk upload from a mysql extract and database
> 2 was populated by individual document inserts (put) database compaction
> was let to complete (took ~30hr on database 2)
>
> is there a command that compacts superfluous data? or am i missing anything?
>
>
> thanks!
>
> -Sharath