You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by "Adam Kocoloski (JIRA)" <ji...@apache.org> on 2010/04/15 16:24:51 UTC

[jira] Created: (COUCHDB-738) more efficient DB compaction (fewer seeks)

more efficient DB compaction (fewer seeks)
------------------------------------------

Key: COUCHDB-738
URL: https://issues.apache.org/jira/browse/COUCHDB-738
Project: CouchDB
Issue Type: Improvement
Components: Database Core
Affects Versions: 0.11, 0.10.1, 0.9.2
Reporter: Adam Kocoloski
Assignee: Adam Kocoloski

CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database. It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree. I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively. The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.

If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests). On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge. Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS. The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.

Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.

This patch might have downsides when documents have a large number of edits. These include an increase in the size of the database and slower view indexing. I expect both to be small effects.

The patch can be applied directly to trunk@934272. Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-738:
-----------------------------------

    Attachment: 738-efficient-compaction-v2.patch

Here's a new patch which applies cleanly to trunk@940505 and has one important change.  In couch_view_updater:load_doc() we call couch_doc:to_doc_info() if we have a #full_doc_info{} record before calling couch_db:open_doc_int().  The reason for this is to skip the rather expensive call to get the entire revision path for the winning revision which shows up in the open_doc_int clause handling #full_doc_info{} records.  With this change I get the following timings on a 100 edits/doc DB:

MegaView
Elapsed time: 5.98 (s)
Elapsed time: 5.85 (s)
Elapsed time: 4.87 (s)

SimpleView
Elapsed time: 0.86 (s)
Elapsed time: 1.05 (s)
Elapsed time: 0.75 (s)

Those are essentially identical to the timings I got earlier for trunk.


> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch, 738-efficient-compaction-v2.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-738:
-----------------------------------

    Fix Version/s: 1.1

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Joseph Davis updated COUCHDB-738:
--------------------------------------

    Skill Level: Regular Contributors Level (Easy to Medium)

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch, 738-efficient-compaction-v2.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863630#action_12863630 ] 

Adam Kocoloski commented on COUCHDB-738:
----------------------------------------

Any objections if I go ahead and commit this?

It would still be good to investigate DB size and view indexing performance for DBs with frequently-edited documents, but I think maybe it's best to get this patch into trunk before it rots.

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Damien Katz (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863632#action_12863632 ] 

Damien Katz commented on COUCHDB-738:
-------------------------------------

Definitely I'd like to see performance metrics of view building on heaviily editted documents before committing this.

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863646#action_12863646 ] 

Adam Kocoloski commented on COUCHDB-738:
----------------------------------------

Ok, here's a first pass.  I ran the usual perf.py script which loads 1000 'simple' and 1000 'unsimple' docs into a DB and computes the MegaView and SimpleView.  I modified the script to update each document 100 times instead of only once.  Here are the numbers for trunk:

DB size: 1.1G pre-compaction, 12MB post

MegaView
Elapsed time: 7.20 (s)
Elapsed time: 5.65 (s)
Elapsed time: 5.13 (s)

SimpleView
Elapsed time: 0.75 (s)
Elapsed time: 1.04 (s)
Elapsed time: 0.74 (s)

Indexing times after compaction
Elapsed time: 4.91 (s)
Elapsed time: 0.74 (s)

Note that only the first measurement included a full load_docs step.  For the latter measurements I just deleted and rebuilt the index files (updating the docs 100 times took several minutes, and I didn't feel like waiting).  Here are the equivalent numbers with the patch:

DB size: 1.5G pre-compaction, 17MB post

MegaView
Elapsed time: 9.03 (s)
Elapsed time: 8.22 (s)
Elapsed time: 5.92 (s)

SimpleView
Elapsed time: 0.99 (s)
Elapsed time: 0.94 (s)
Elapsed time: 0.91 (s)

After compaction
Elapsed time: 5.81 (s)
Elapsed time: 0.90 (s)

There seems to be quite a bit of variance in these numbers.  I usually take the minimum when benchmarking on my laptop, although due to my skipping the load_docs step the minimum is probably a little too good (the seq_tree is fully cached).  Anyway, if I stick with that methodology I'd say that for DBs with 100 edits/document we see

MegaView: 15-18% slower with patch 
SimpleView: 22-23% slower with patch

The database is 35-40% larger, too.

I might try loading a 500 edits / document DB tonight.  I may also look at eprof to see if there are simple optimizations for this use case.  Is the number of edit branches significant?  So far I haven't been introducting any conflicts in the test.

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863670#action_12863670 ] 

Adam Kocoloski commented on COUCHDB-738:
----------------------------------------

An updated eprof:

http://friendpaste.com/44ZPZ4dq88oIUBkOHfWfWh

The big difference is the disappearance of couch_key_tree:get_full_key_paths/4

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch, 738-efficient-compaction-v2.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863648#action_12863648 ] 

Adam Kocoloski commented on COUCHDB-738:
----------------------------------------

Here are the heavy hitters during part of a MegaView build according to eprof.  I don't know who is calling erlang:setelement/3

http://friendpaste.com/tnh37NykE0QFD6MbDesgB



> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Damien Katz (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867197#action_12867197 ] 

Damien Katz commented on COUCHDB-738:
-------------------------------------

I've been thinking about this issue, and I think storing the whole rev tree in the by_seq index is a bad idea.

I'm also thinking of ways to make the compaction faster.

Store the full_doc_info outside the by_id btree, but store the doc_info instead (with it's pointers to the main doc and conflicts) with a pointer to the full_doc_info as well. On reads, this avoids the overhead of loading up the ful_doc_info just to get the main doc. Updates and replications reads (that request the rev_info) will have to load up the full_doc_info with an extra read, but unlikely to be an extra disk io since it will be close the most recent doc revision.

The by_seq index will also have the doc_info and a pointer to the full_doc_info.

Then on compaction, scan the by_seq index, copying over the full_doc_info and the documents and attachments into a new file. Each full_doc_info should be linked to the next with an offset to next's file pos.

Then scan the newly written full_doc_info, converting them to doc_infos and pointers to the full_doc_info, and writing them out consecutively to the end of the file.

Then sort just this portion of the file, on disk, by the id in the doc_info. This is the most expensive part of the compaction, but sorting things on disk is common problem with lots of open libraries out there that are highly optimized.

Then convert this id sorted portion of the file to btree leaf nodes. Then rescan leaf nodes and build up the inner nodes, recurse until you have single root node left. This is now your by_id index.

Then rescan the full_doc_infos, and write out to the end of the file the doc_infos and pointers back to the full_doc_infos. This is already sorted by_seq.

Then convert this seq sorted portion of the file to btree leaf nodes. Then rescan leaf nodes and build up the inner nodes, recurse until you have single root node left. This is now your by_seq index.

You now a fully compacted database with no wasted btree nodes. I think this will be a lot faster. With the exception of the by_id sorting phase, this eliminates random disk seeks.

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>             Fix For: 1.1
>
>         Attachments: 738-efficient-compaction-v1.patch, 738-efficient-compaction-v2.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-738:
-----------------------------------

    Attachment: 738-efficient-compaction-v1.patch

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>         Attachments: 738-efficient-compaction-v1.patch
>
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (COUCHDB-738) more efficient DB compaction (fewer seeks)

Posted by "Adam Kocoloski (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Kocoloski updated COUCHDB-738:
-----------------------------------


http://github.com/kocolosk/couchdb/tree/738-efficient-compaction

> more efficient DB compaction (fewer seeks)
> ------------------------------------------
>
>                 Key: COUCHDB-738
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-738
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9.2, 0.10.1, 0.11
>            Reporter: Adam Kocoloski
>            Assignee: Adam Kocoloski
>
> CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database.  It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree.  I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively.  The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file.
> If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests).  On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge.  Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS.  The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster.
> Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator.
> This patch might have downsides when documents have a large number of edits.  These include an increase in the size of the database and slower view indexing.  I expect both to be small effects.
> The patch can be applied directly to trunk@934272.  Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira