You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Wout Mertens <wo...@gmail.com> on 2010/05/14 17:09:11 UTC

Re: ways to improve compaction

Old thread I know, but I was wondering about a way to make compaction more fluid:

On Dec 21, 2009, at 23:20 , Damien Katz wrote:

> I saw recently some issues people where having with compaction, and I thought I'd get some thoughts down about ways to improve the compaction code/experience.
> 
> 1. Multi-process pipeline processing. Similar to the enhancements to the view indexing, there is opportunities for pipelining operations instead of the current read/write batch operations it does. This can reduce memory usage and make compaction faster.
> 2. Multiple disks/mount points. CouchDB could easily have 2 or more database dirs, and each time it compacts, it copies the new database file to another dir/disk/mountpoint. For servers with multiple disks this will greatly smooth the copying as the disk heads won't need to seek between reads and writes.
> 3. Better compaction algorithms. There are all sorts of clever things that could be done to make the compaction faster. Right now it rebuilds the database in a similar manner as if it would if it clients were bulk updating it. This was the simplest way to do it, but certainly not the fastest. There are a lot of ways to make this much more efficient, they just take more work.
> 4. Tracking wasted space. This can be used to determine threshold for compaction. We don't  need to track with 100% accuracy how much disk space is being wasted, but it would be a big improvement to at least know how much disk space the raw docs take, and maybe calculate an estimate of the indexes necessary to support them in a freshly compacted database.
> 5. Better Low level file driver support. Because we are using the Erlang built-in file system drivers, we don't have access to a lot of flags. If we had our own drivers, one option we'd like to use is to not OS cache the reads and write during the compaction, it's unnecessary for compaction and it could completely consume the cache with rarely accessed data, evicting lots of recently used live data, greatly hurting performance of other databases.
> 
> Anyway, just getting these thoughts out. More ideas and especially code welcome.

How about

6. Store the databases in multiple files. Instead of one really big file, use several big chunk-files of fixed maximum length. One chunk-file is "active" and receives writes. Once that chunk-file grows past a certain size, for example 25MB, start a new file. Then, at compaction time, you can do the compaction one chunk-file at a time.
Possible optimization: If a certain chunk-file has no outdated documents (or only a small %), leave it alone.

I'm armchair-programming here, I have only a vague idea of what the on-disk format looks like, but this could allow continuous compaction, by only compacting (slowly) the completed chunk-files. Furthermore, it would allow spreading the database across multiple disks (since there are now multiple files per db), although one disk would still be receiving all the writes. A smart write scheduler could make sure different databases have different active disks. Possibly, multiple chunk-files could be active at the same time, providing all sorts of interesting failure scenarios ;-)

Thoughts?

Wout.

Re: ways to improve compaction

Posted by Adam Kocoloski <ko...@apache.org>.

On May 14, 2010, at 11:09 AM, Wout Mertens wrote:

> Old thread I know, but I was wondering about a way to make compaction more fluid:
> 
> On Dec 21, 2009, at 23:20 , Damien Katz wrote:
> 
>> I saw recently some issues people where having with compaction, and I thought I'd get some thoughts down about ways to improve the compaction code/experience.
>> 
>> 1. Multi-process pipeline processing. Similar to the enhancements to the view indexing, there is opportunities for pipelining operations instead of the current read/write batch operations it does. This can reduce memory usage and make compaction faster.
>> 2. Multiple disks/mount points. CouchDB could easily have 2 or more database dirs, and each time it compacts, it copies the new database file to another dir/disk/mountpoint. For servers with multiple disks this will greatly smooth the copying as the disk heads won't need to seek between reads and writes.
>> 3. Better compaction algorithms. There are all sorts of clever things that could be done to make the compaction faster. Right now it rebuilds the database in a similar manner as if it would if it clients were bulk updating it. This was the simplest way to do it, but certainly not the fastest. There are a lot of ways to make this much more efficient, they just take more work.
>> 4. Tracking wasted space. This can be used to determine threshold for compaction. We don't  need to track with 100% accuracy how much disk space is being wasted, but it would be a big improvement to at least know how much disk space the raw docs take, and maybe calculate an estimate of the indexes necessary to support them in a freshly compacted database.
>> 5. Better Low level file driver support. Because we are using the Erlang built-in file system drivers, we don't have access to a lot of flags. If we had our own drivers, one option we'd like to use is to not OS cache the reads and write during the compaction, it's unnecessary for compaction and it could completely consume the cache with rarely accessed data, evicting lots of recently used live data, greatly hurting performance of other databases.
>> 
>> Anyway, just getting these thoughts out. More ideas and especially code welcome.
> 
> 
> How about
> 
> 6. Store the databases in multiple files. Instead of one really big file, use several big chunk-files of fixed maximum length. One chunk-file is "active" and receives writes. Once that chunk-file grows past a certain size, for example 25MB, start a new file. Then, at compaction time, you can do the compaction one chunk-file at a time.
> Possible optimization: If a certain chunk-file has no outdated documents (or only a small %), leave it alone.
> 
> I'm armchair-programming here, I have only a vague idea of what the on-disk format looks like, but this could allow continuous compaction, by only compacting (slowly) the completed chunk-files. Furthermore, it would allow spreading the database across multiple disks (since there are now multiple files per db), although one disk would still be receiving all the writes. A smart write scheduler could make sure different databases have different active disks. Possibly, multiple chunk-files could be active at the same time, providing all sorts of interesting failure scenarios ;-)
> 
> Thoughts?
> 
> Wout.

Hi Wout, Robert Newson suggested the very same in the original thread.  It's a solid idea, to be sure.

In related work, there's COUCHDB-738

https://issues.apache.org/jira/browse/COUCHDB-738

I wrote a patch to change the internal database format that allows compaction to skip an extra lookup in the by_id tree.  Its a huge win for write-once DBs with random docids -- something like a 6x improvement in compaction speed in one test.  However, DBs with frequently edited documents become 35-40% larger pre- and post-compaction.

Damien has proposed a better alternative in that thread which is a much bigger rewrite of the compaction algorithm.  Best,

Adam