You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Anik Das <ma...@gmail.com> on 2016/02/02 22:08:27 UTC

Storage Issues on 600,000 document insertion

Hello All,

We were developing an application where we had to insert approximately 600,000
documents into a database. The database had only one view (value emitted as
null).

  

It was not a batch insertion. After the insertion the database took up 3.5GB
to our wonder. I googled around and did a compact query. After the compact
query the size reduced to 350MB.

  

I am new to couchdb and I'm unable to figure out what exactly is
happening/happened.  

  

Anik Das


Re: Storage Issues on 600,000 document insertion

Posted by Alexander Shorin <kx...@gmail.com>.
On Sat, Feb 6, 2016 at 5:44 PM, Florian Westreicher <st...@meredrica.org> wrote:
> Just a random thought but would this also work by replicating the database
> and deleting the old one?
> That way the new database should stay available and not be bothered with
> compaction. Of course the database then needs to be switched and then
> replicated once again to capture all the new changes

Replication cannot transfer:
1. _local docs that used for replication checkpoints
2. Database security
3. View indexes

Also, it may not produce more "compact" result as compaction.

--
,,,^..^,,,

Re: Storage Issues on 600,000 document insertion

Posted by Florian Westreicher <st...@meredrica.org>.
Hello!

Just a random thought but would this also work by replicating the database 
and deleting the old one?
That way the new database should stay available and not be bothered with 
compaction. Of course the database then needs to be switched and then 
replicated once again to capture all the new changes



On February 6, 2016 00:06:08 Anik Das <ma...@gmail.com> wrote:

> Thanks Dave,
>
>
>
> That was a very comprehensive article. We are currently running compaction
> nightly. That's helping for now.
>
>
>
> Regards,
>
>> On Feb 5 2016, at 7:36 pm, Dave Cottlehuber &lt;dch@skunkwerks.at&gt; wrote:
>
>>
>
>> On Tue, 2 Feb 2016, at 10:08 PM, Anik Das wrote:
> &gt; Hello All,
> &gt;
> &gt; We were developing an application where we had to insert approximately
> &gt; 600,000
> &gt; documents into a database. The database had only one view (value emitted
> &gt; as
> &gt; null).
> &gt;
> &gt;
> &gt;
> &gt; It was not a batch insertion. After the insertion the database took up
> &gt; 3.5GB
> &gt; to our wonder. I googled around and did a compact query. After the
> &gt; compact
> &gt; query the size reduced to 350MB.
> &gt;
> &gt;
> &gt;
> &gt; I am new to couchdb and I'm unable to figure out what exactly is
> &gt; happening/happened.
> &gt;
> &gt;
> &gt;
> &gt; Anik Das
>
>>
>
>> Welcome Anik :-)
>
>>
>
>> Some quick points:
>
>>
>
>> \- we use a B tree in CouchDB
> \- its append only
> \- to find a doc we walk down the tree from the root node
> \- the root node is always the last node in the .couch btree file
> \- adding or updating a doc requires appending (in order) the doc, and
> intermediary levels, and finally the new root node of the tree
> \- thus a single doc update needs to rewrite at least 2 nodes, itself +
> the new root
> \- as the tree gets wider (more leaf node documents) it also grows slowly
> and increases levels
> \- this adds more intermediate nodes to be updated as we go along
>
>>
>
>> <http://horicky.blogspot.co.at/2008/10/couchdb-implementation.html> is a
> very nice but old picture of this.
>
>>
>
>> You should always plan to compact after a big upload or replication, but
> a couple of things will ease the pain:
>
>>
>
>> \- use _bulk_docs (and do some testing for optimum chunk size)
> \- upload docs in uuid order (don't rely on couch generated uuids)
>
>>
>
>> both of these reduce the number of interim updates to the tree, the
> first simply by only rewriting at the end of each bulk update, the the
> last by adding data in sorted order, less intermediary nodes need
> updating.
>
>>
>
>> Most people run compaction through a cron job or similar out of hours
> scheduling tool.
>
>>
>
>> A+
> Dave
>



Re: Storage Issues on 600,000 document insertion

Posted by Anik Das <ma...@gmail.com>.
Thanks Dave,

  

That was a very comprehensive article. We are currently running compaction
nightly. That's helping for now.

  

Regards,

> On Feb 5 2016, at 7:36 pm, Dave Cottlehuber &lt;dch@skunkwerks.at&gt; wrote:  

>

> On Tue, 2 Feb 2016, at 10:08 PM, Anik Das wrote:  
&gt; Hello All,  
&gt;  
&gt; We were developing an application where we had to insert approximately  
&gt; 600,000  
&gt; documents into a database. The database had only one view (value emitted  
&gt; as  
&gt; null).  
&gt;  
&gt;  
&gt;  
&gt; It was not a batch insertion. After the insertion the database took up  
&gt; 3.5GB  
&gt; to our wonder. I googled around and did a compact query. After the  
&gt; compact  
&gt; query the size reduced to 350MB.  
&gt;  
&gt;  
&gt;  
&gt; I am new to couchdb and I'm unable to figure out what exactly is  
&gt; happening/happened.  
&gt;  
&gt;  
&gt;  
&gt; Anik Das

>

> Welcome Anik :-)

>

> Some quick points:

>

> \- we use a B tree in CouchDB  
\- its append only  
\- to find a doc we walk down the tree from the root node  
\- the root node is always the last node in the .couch btree file  
\- adding or updating a doc requires appending (in order) the doc, and  
intermediary levels, and finally the new root node of the tree  
\- thus a single doc update needs to rewrite at least 2 nodes, itself +  
the new root  
\- as the tree gets wider (more leaf node documents) it also grows slowly  
and increases levels  
\- this adds more intermediate nodes to be updated as we go along

>

> <http://horicky.blogspot.co.at/2008/10/couchdb-implementation.html> is a  
very nice but old picture of this.

>

> You should always plan to compact after a big upload or replication, but  
a couple of things will ease the pain:

>

> \- use _bulk_docs (and do some testing for optimum chunk size)  
\- upload docs in uuid order (don't rely on couch generated uuids)

>

> both of these reduce the number of interim updates to the tree, the  
first simply by only rewriting at the end of each bulk update, the the  
last by adding data in sorted order, less intermediary nodes need  
updating.

>

> Most people run compaction through a cron job or similar out of hours  
scheduling tool.

>

> A+  
Dave


Re: Storage Issues on 600,000 document insertion

Posted by Dave Cottlehuber <dc...@skunkwerks.at>.
On Tue, 2 Feb 2016, at 10:08 PM, Anik Das wrote:
> Hello All,
> 
> We were developing an application where we had to insert approximately
> 600,000
> documents into a database. The database had only one view (value emitted
> as
> null).
> 
>   
> 
> It was not a batch insertion. After the insertion the database took up
> 3.5GB
> to our wonder. I googled around and did a compact query. After the
> compact
> query the size reduced to 350MB.
> 
>   
> 
> I am new to couchdb and I'm unable to figure out what exactly is
> happening/happened.  
> 
>   
> 
> Anik Das

Welcome Anik :-)

Some quick points:

- we use a B tree in CouchDB
- its append only
- to find a doc we walk down the tree from the root node
- the root node is always the last node in the .couch btree file
- adding or updating a doc requires appending (in order) the doc, and
intermediary levels, and finally the new root node of the tree
- thus a single doc update needs to rewrite at least 2 nodes, itself +
the new root
- as the tree gets wider (more leaf node documents) it also grows slowly
and increases levels
- this adds more intermediate nodes to be updated as we go along

http://horicky.blogspot.co.at/2008/10/couchdb-implementation.html is a
very nice but old picture of this.

You should always plan to compact after a big upload or replication, but
a couple of things will ease the pain:

- use _bulk_docs (and do some testing for optimum chunk size)
- upload docs in uuid order (don't rely on couch generated uuids)

both of these reduce the number of interim updates to the tree, the
first simply by only rewriting at the end of each bulk update, the the
last by adding data in sorted order, less intermediary nodes need
updating.

Most people run compaction through a cron job or similar out of hours
scheduling tool.

A+
Dave

Re: Storage Issues on 600,000 document insertion

Posted by Anik Das <ma...@gmail.com>.
Thank you Dan will try this method.

> On Feb 2 2016, at 6:22 pm, Dan Santner &lt;dansantner@me.com&gt; wrote:  

>

> We run compaction nightly.

>

> We also don’t need to keep revision history, so if you need it…obviously
this won’t work for you.

>

> &gt; On Feb 2, 2016, at 3:17 PM, Anik Das &lt;mailanik@gmail.com&gt; wrote:  
&gt;  
&gt; Is it a convention to run compact query on an frequent interval?  
&gt;  
&gt;  
&gt;  
&gt; And I would also love to know the reason behind it. (My initial thought
is  
&gt; that whenever a insertion happens it creates a new revision of the view
and  
&gt; that could be one reason)  
&gt;  
&gt;  
&gt;  
&gt; Thanks in advance. :)  
&gt;  
&gt;  
&gt;  
&gt;&gt; On Feb 2 2016, at 6:12 pm, Александр Опак
&amp;lt;opak.alexandr@gmail.com&amp;gt;  
&gt; wrote:  
&gt;  
&gt;&gt;  
&gt;  
&gt;&gt; it's normal :)  
&gt;  
&gt;&gt;  
&gt;  
&gt;&gt; 2016-02-02 23:08 GMT+02:00 Anik Das
&amp;lt;mailanik@gmail.com&amp;gt;:  
&gt; &amp;gt; Hello All,  
&gt; &amp;gt;  
&gt; &amp;gt; We were developing an application where we had to insert
approximately  
&gt; 600,000  
&gt; &amp;gt; documents into a database. The database had only one view (value
emitted  
&gt; as  
&gt; &amp;gt; null).  
&gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; It was not a batch insertion. After the insertion the database
took up  
&gt; 3.5GB  
&gt; &amp;gt; to our wonder. I googled around and did a compact query. After
the  
&gt; compact  
&gt; &amp;gt; query the size reduced to 350MB.  
&gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; I am new to couchdb and I'm unable to figure out what exactly is  
&gt; &amp;gt; happening/happened.  
&gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt;  
&gt; &amp;gt; Anik Das  
&gt; &amp;gt;  
&gt;  
&gt;&gt;  
&gt;  
&gt;&gt; \\--  
&gt; github:  
&gt; https://github.com/OpakAlex  
&gt;


Re: Storage Issues on 600,000 document insertion

Posted by Dan Santner <da...@me.com>.
We run compaction nightly.

We also don’t need to keep revision history, so if you need it…obviously this won’t work for you.

> On Feb 2, 2016, at 3:17 PM, Anik Das <ma...@gmail.com> wrote:
> 
> Is it a convention to run compact query on an frequent interval?
> 
> 
> 
> And I would also love to know the reason behind it. (My initial thought is
> that whenever a insertion happens it creates a new revision of the view and
> that could be one reason)
> 
> 
> 
> Thanks in advance. :)
> 
> 
> 
>> On Feb 2 2016, at 6:12 pm, Александр Опак &lt;opak.alexandr@gmail.com&gt;
> wrote:  
> 
>> 
> 
>> it's normal :)
> 
>> 
> 
>> 2016-02-02 23:08 GMT+02:00 Anik Das &lt;mailanik@gmail.com&gt;:  
> &gt; Hello All,  
> &gt;  
> &gt; We were developing an application where we had to insert approximately
> 600,000  
> &gt; documents into a database. The database had only one view (value emitted
> as  
> &gt; null).  
> &gt;  
> &gt;  
> &gt;  
> &gt; It was not a batch insertion. After the insertion the database took up
> 3.5GB  
> &gt; to our wonder. I googled around and did a compact query. After the
> compact  
> &gt; query the size reduced to 350MB.  
> &gt;  
> &gt;  
> &gt;  
> &gt; I am new to couchdb and I'm unable to figure out what exactly is  
> &gt; happening/happened.  
> &gt;  
> &gt;  
> &gt;  
> &gt; Anik Das  
> &gt;
> 
>> 
> 
>> \--  
> github:  
> https://github.com/OpakAlex
> 


Re: Storage Issues on 600,000 document insertion

Posted by Anik Das <ma...@gmail.com>.
Is it a convention to run compact query on an frequent interval?

  

And I would also love to know the reason behind it. (My initial thought is
that whenever a insertion happens it creates a new revision of the view and
that could be one reason)

  

Thanks in advance. :)

  

> On Feb 2 2016, at 6:12 pm, Александр Опак &lt;opak.alexandr@gmail.com&gt;
wrote:  

>

> it's normal :)

>

> 2016-02-02 23:08 GMT+02:00 Anik Das &lt;mailanik@gmail.com&gt;:  
&gt; Hello All,  
&gt;  
&gt; We were developing an application where we had to insert approximately
600,000  
&gt; documents into a database. The database had only one view (value emitted
as  
&gt; null).  
&gt;  
&gt;  
&gt;  
&gt; It was not a batch insertion. After the insertion the database took up
3.5GB  
&gt; to our wonder. I googled around and did a compact query. After the
compact  
&gt; query the size reduced to 350MB.  
&gt;  
&gt;  
&gt;  
&gt; I am new to couchdb and I'm unable to figure out what exactly is  
&gt; happening/happened.  
&gt;  
&gt;  
&gt;  
&gt; Anik Das  
&gt;

>

> \--  
github:  
https://github.com/OpakAlex


Re: Storage Issues on 600,000 document insertion

Posted by Александр Опак <op...@gmail.com>.
it's normal :)

2016-02-02 23:08 GMT+02:00 Anik Das <ma...@gmail.com>:
> Hello All,
>
> We were developing an application where we had to insert approximately 600,000
> documents into a database. The database had only one view (value emitted as
> null).
>
>
>
> It was not a batch insertion. After the insertion the database took up 3.5GB
> to our wonder. I googled around and did a compact query. After the compact
> query the size reduced to 350MB.
>
>
>
> I am new to couchdb and I'm unable to figure out what exactly is
> happening/happened.
>
>
>
> Anik Das
>



-- 
github:
https://github.com/OpakAlex