You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Paul Joseph Davis (JIRA)" <ji...@apache.org> on 2009/11/11 20:26:39 UTC

[jira] Commented: (COUCHDB-568) When delayed_commits = true, keep updated btree nodes in memory until the commit

    [ https://issues.apache.org/jira/browse/COUCHDB-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776593#action_12776593 ] 

Paul Joseph Davis commented on COUCHDB-568:
-------------------------------------------

Definitely an interesting idea. In the thread on parallelized b~trees I was basically thinking that we could take Damien's modifications to couch_db_updater.erl and push all that logic down into couch_btree.erl which would allow multiple mappers to make more efficient use of btree updates in view generation.

Allowing a b~tree to hold new nodes in RAM for some duration before being synced should allow for similar speedups in the  condensed tree writes like batch docs did.

Though, this could allow for readers to see a database in a state that was never on disk. With batch writes we never allowed readers to see the docs in the write buffer, so the progression of db state was unidirectional. With pending writes viewable, an error could make the btree state go backwards. If that makes sense.

Much to contemplate.

> When delayed_commits = true, keep updated btree nodes in memory until the commit
> --------------------------------------------------------------------------------
>
>                 Key: COUCHDB-568
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-568
>             Project: CouchDB
>          Issue Type: Improvement
>    Affects Versions: 0.10
>            Reporter: Adam Kocoloski
>
> rnewson reported on IRC that the new batch=ok implementation results in significantly larger overhead in the .couch files.  This makes sense; the old batch mode waited 1 second before saving, but the new implementation just updates the doc asynchronously.  With fast hardware and moderate write rates it's likely that each document is being written separately.
> The overhead presumably arises from frequently updated btree inner nodes being written to disk many times over.  I'm interested in exploring a modification of the delayed_commits mode whereby the updated btree nodes are not actually written to disk immediately, but are instead held in memory until the commit.  I'd like to think that this will result in more compact files without any decrease in durability.  New read requests would still be able to access these in-memory nodes.
> I realize the notion that updates go directly to disk is baked pretty deeply into couch_btree, but I still thought this was worth bringing up to a wider audience.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.