You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Jean-Yves Moulin <je...@eileo.com> on 2014/04/22 16:09:59 UTC

Issues with terabytes databases

Hi everybody,

we use CouchDB in production for more than two years now. And we are almost happy with it :-) We have a heavy writing workload, with very few update, and we never delete data. Some of our databases are terabytes with billions of documents (sometimes 20 millions of doc per day). But we are experiencing some issues, and the only solution was to split our data: today we create a new database each week, with even and odd on two different servers (thus we have on-line and off-line servers). This is not perfect, and we look forward BigCouch :-)

Below is some of our current problems with these big databases. For the record, we use couchdb-1.2 and couchdb-1.4 on twelve servers running FreeBSD (because we like ZFS).

I don't know if these issues are known or not (or specific to us).

* Overall speed: we are far from our real server performance: it seems that CouchDB is not able to use the full potential of the system. Even with 24 disks in RAID10, we can't go faster that 2000 doc/sec (with an average document size of 1k, that's only a few MB/s on disk) on replication or compaction. CPU and disk are almost idle. Tweaking the number of Erlang I/O thread doesn't help.

* Insert time: At 1000 PUT/sec the insert time is good, even without bulk. But it collapses when launching view calculation, replication or compaction. So, we use stale view in our applications and views are processed regularly by a crontab scripts. We avoid compaction on live servers. Compaction are launched manually on off-line servers only. We also avoid replication on heavy loaded servers.

* Compaction: When size of database increase, compaction time can be really really long. It will be great if compaction process can run faster on already compressed doc. This is our biggest drawback, which implies the database split each week. And the speed decreases slowly: compaction starts fast (>2000 doc/sec) but slow down to ~100 doc/sec after hundred of millions of documents.

Is there other people using CouchDB this kind of database ? How do you handle a write-heavy workload ?

Sorry for my english and thank you for the reading.

Best,

Re: Issues with terabytes databases

Posted by Adam Kocoloski <ko...@apache.org>.

Hi Jean-Yves, welcome! Always nice to hear reports from production deployments. Replies inline:

On Apr 22, 2014, at 10:09 AM, Jean-Yves Moulin <je...@eileo.com> wrote:

> Hi everybody,
> 
> we use CouchDB in production for more than two years now. And we are almost happy with it :-) We have a heavy writing workload, with very few update, and we never delete data. Some of our databases are terabytes with billions of documents (sometimes 20 millions of doc per day). But we are experiencing some issues, and the only solution was to split our data: today we create a new database each week, with even and odd on two different servers (thus we have on-line and off-line servers). This is not perfect, and we look forward BigCouch :-)

You and me both :)

> Below is some of our current problems with these big databases. For the record, we use couchdb-1.2 and couchdb-1.4 on twelve servers running FreeBSD (because we like ZFS).
> 
> I don't know if these issues are known or not (or specific to us).
> 
> * Overall speed: we are far from our real server performance: it seems that CouchDB is not able to use the full potential of the system. Even with 24 disks in RAID10, we can't go faster that 2000 doc/sec (with an average document size of 1k, that's only a few MB/s on disk) on replication or compaction. CPU and disk are almost idle. Tweaking the number of Erlang I/O thread doesn't help.

For replication, there are various parameters that you can supply to allocate more resources to a given job. For your example I might try something like

{
  "source": "...",
  "target": "...",
  "worker_processes": 0.8 * <number of cores on server mediating replication>,
  "worker_batch_size": 500,
  "http_connections": 300
}

though even with those settings I don't usually see the replicator exceed 4000 docs/sec or so. At that point it's generally used up a good chunk of the CPU on a standard dual socket server. Note that mediating the replication on a less-loaded server can sometimes help significantly.

> * Insert time: At 1000 PUT/sec the insert time is good, even without bulk. But it collapses when launching view calculation, replication or compaction. So, we use stale view in our applications and views are processed regularly by a crontab scripts. We avoid compaction on live servers. Compaction are launched manually on off-line servers only. We also avoid replication on heavy loaded servers.

We've done some work internally at Cloudant on ensuring that e.g. compaction only runs in the background and does not impact the throughput of interactive operations. We need to do some work to get that into a place where it can be contributed back. I don't have a better answer for this one at the moment.

> * Compaction: When size of database increase, compaction time can be really really long. It will be great if compaction process can run faster on already compressed doc. This is our biggest drawback, which implies the database split each week. And the speed decreases slowly: compaction starts fast (>2000 doc/sec) but slow down to ~100 doc/sec after hundred of millions of documents.

There's a new compactor in the works that's significantly faster and also generates smaller post-compaction files. It also eliminates this exponential falloff in throughput that you observed.

> Is there other people using CouchDB this kind of database ? How do you handle a write-heavy workload ?

Billion document databases with 20 million updates per day are certainly within scope for BigCouch. Cheers,

Adam

> 
> Sorry for my english and thank you for the reading.
> 
> Best,
>

Re: Issues with terabytes databases

Posted by Vladimir Ralev <vl...@gmail.com>.

Hi very interesting case, are you sure you are not exceeding the write
capacity of the RAID, few MBs per second is actually enough to starve a lot
of the older HDDs, remember they have to reposition themselves for almost
every write when load is concurrent and out of context, essentially random
access writes on sequential media. This effect is amplified if you are
partitioning your databases. What errors do you get when you go above the
capacity? Do you see error lines in the log? How is the IOwait? Are you
running with debug logs enabled? Do you monitor your couchjs workers and
the system limits?

I think your compaction problem can be solved by BigCouch since it tries to
split the databases into smaller partitions, but splitting the data into
separate databases manually is still common.


On Tue, Apr 22, 2014 at 5:09 PM, Jean-Yves Moulin <
jean-yves.moulin@eileo.com> wrote:

> Hi everybody,
>
> we use CouchDB in production for more than two years now. And we are
> almost happy with it :-) We have a heavy writing workload, with very few
> update, and we never delete data. Some of our databases are terabytes with
> billions of documents (sometimes 20 millions of doc per day). But we are
> experiencing some issues, and the only solution was to split our data:
> today we create a new database each week, with even and odd on two
> different servers (thus we have on-line and off-line servers). This is not
> perfect, and we look forward BigCouch :-)
>
> Below is some of our current problems with these big databases. For the
> record, we use couchdb-1.2 and couchdb-1.4 on twelve servers running
> FreeBSD (because we like ZFS).
>
> I don't know if these issues are known or not (or specific to us).
>
> * Overall speed: we are far from our real server performance: it seems
> that CouchDB is not able to use the full potential of the system. Even with
> 24 disks in RAID10, we can't go faster that 2000 doc/sec (with an average
> document size of 1k, that's only a few MB/s on disk) on replication or
> compaction. CPU and disk are almost idle. Tweaking the number of Erlang I/O
> thread doesn't help.
>
> * Insert time: At 1000 PUT/sec the insert time is good, even without bulk.
> But it collapses when launching view calculation, replication or
> compaction. So, we use stale view in our applications and views are
> processed regularly by a crontab scripts. We avoid compaction on live
> servers. Compaction are launched manually on off-line servers only. We also
> avoid replication on heavy loaded servers.
>
> * Compaction: When size of database increase, compaction time can be
> really really long. It will be great if compaction process can run faster
> on already compressed doc. This is our biggest drawback, which implies the
> database split each week. And the speed decreases slowly: compaction starts
> fast (>2000 doc/sec) but slow down to ~100 doc/sec after hundred of
> millions of documents.
>
> Is there other people using CouchDB this kind of database ? How do you
> handle a write-heavy workload ?
>
> Sorry for my english and thank you for the reading.
>
> Best,
>
>