You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by "Filipe Manana (JIRA)" <ji...@apache.org> on 2011/01/12 02:26:46 UTC

[jira] Created: (COUCHDB-1023) Batching writes of BTree nodes (when possible) and in the DB updater

Batching writes of BTree nodes (when possible) and in the DB updater
--------------------------------------------------------------------

Key: COUCHDB-1023
URL: https://issues.apache.org/jira/browse/COUCHDB-1023
Project: CouchDB
Issue Type: Improvement
Components: Database Core
Reporter: Filipe Manana

Recently I started experimenting with batching writes in the DB updater.

For a test of 100 writers of 1Kb documents for e.g., most often the updater collects between 20 and 30 documents to write.

Currently it does a file:write operation for each one. Not only this is slower, but it implies more context switches and stressing the OS/filesystem by allocating few blocks very often (since we use a pure file append write mode). The same can be done in the BTree node writes.

The following branch/patch, is an experiment of batching writes:

https://github.com/fdmanana/couchdb/compare/batch_writes

In couch_file there's a quick test method that compares the time taken to write X blocks of size Y versus writing a single block of size X * Y.
Example:

Eshell V5.8.2 (abort with ^G)
1> Apache CouchDB 1.2.0aa777195-git (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.
[info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/

1> couch_file:test(1000, 30).
multi writes of 30 binaries, each of size 1000 bytes, took 1920us
batch write of 30 binaries, each of size 1000 bytes, took 344us
ok
2>
2> couch_file:test(4000, 30).
multi writes of 30 binaries, each of size 4000 bytes, took 2002us
batch write of 30 binaries, each of size 4000 bytes, took 700us
ok
3>

One order of magnitude less is quite significant I would say.

Lower response times are mostly noticeable when delayed_commits are set to true.
Running a writes only test with this branch gave me:

http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902e544

While with trunk I got:

http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902eb50

These tests were done on Linux with ext4 (and OTP R14B01).

However I'm still not 100% sure if this worth applying to trunk.
Any thoughts?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-1023) Batching writes of BTree nodes (when possible) and in the DB updater

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980495#action_12980495 ] 

Paul Joseph Davis commented on COUCHDB-1023:
--------------------------------------------

In theory the btree update is fine. I'm not entirely familiar with that part of the db updater code so I can't comment with any authority on that section. I trust that its not any more crazy than just changing enough code to enable multiple writes and what not.

One comment I do have is that I would prefer that the couch_file api is more straight forward. For instance, the btree code has to do its own term_to_binary call when you could just create a couch_file:append_terms/2 method that would do that which would make things a bit more clean in client code.

In a one off comment, I'm still contemplating extending the fd NIF to not break the scheduler which may make some of these sorts of "optimizations" as moot. Depending on the severity of the snowacalypse tomorrow I may have the day off and this sounds like something I might work on.

> Batching writes of BTree nodes (when possible) and in the DB updater
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-1023
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1023
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Filipe Manana
>
> Recently I started experimenting with batching writes in the DB updater.
> For a test of 100 writers of 1Kb documents for e.g., most often the updater collects between 20 and 30 documents to write.
> Currently it does a file:write operation for each one. Not only this is slower, but it implies more context switches and stressing the OS/filesystem by allocating few blocks very often (since we use a pure file append write mode). The same can be done in the BTree node writes.
> The following branch/patch, is an experiment of batching writes:
> https://github.com/fdmanana/couchdb/compare/batch_writes
> In couch_file there's a quick test method that compares the time taken to write X blocks of size Y versus writing a single block of size X * Y.
> Example:
> Eshell V5.8.2  (abort with ^G)
> 1> Apache CouchDB 1.2.0aa777195-git (LogLevel=info) is starting.
> Apache CouchDB has started. Time to relax.
> [info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/
> 1> couch_file:test(1000, 30).
> multi writes of 30 binaries, each of size 1000 bytes, took 1920us
> batch write of 30 binaries, each of size 1000 bytes,  took 344us
> ok
> 2> 
> 2> couch_file:test(4000, 30).
> multi writes of 30 binaries, each of size 4000 bytes, took 2002us
> batch write of 30 binaries, each of size 4000 bytes,  took 700us
> ok
> 3> 
> One order of magnitude less is quite significant I would say.
> Lower response times are mostly noticeable when delayed_commits are set to true.
> Running a writes only test with this branch gave me:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902e544
> While with trunk I got:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902eb50
> These tests were done on Linux with ext4 (and OTP R14B01).
> However I'm still not 100% sure if this worth applying to trunk.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-1023) Batching writes of BTree nodes (when possible) and in the DB updater

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980496#action_12980496 ] 

Filipe Manana commented on COUCHDB-1023:
----------------------------------------

"One comment I do have is that I would prefer that the couch_file api is more straight forward. For instance, the btree code has to do its own term_to_binary call when you could just create a couch_file:append_terms/2 method that would do that which would make things a bit more clean in client code. "

That was sort of intentional: 1) wanted to do a quick testing; 2) by not having an append_terms_md5 version I avoid doing another map to transform each term into a binary

But no objections to that at all

> Batching writes of BTree nodes (when possible) and in the DB updater
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-1023
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1023
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Filipe Manana
>
> Recently I started experimenting with batching writes in the DB updater.
> For a test of 100 writers of 1Kb documents for e.g., most often the updater collects between 20 and 30 documents to write.
> Currently it does a file:write operation for each one. Not only this is slower, but it implies more context switches and stressing the OS/filesystem by allocating few blocks very often (since we use a pure file append write mode). The same can be done in the BTree node writes.
> The following branch/patch, is an experiment of batching writes:
> https://github.com/fdmanana/couchdb/compare/batch_writes
> In couch_file there's a quick test method that compares the time taken to write X blocks of size Y versus writing a single block of size X * Y.
> Example:
> Eshell V5.8.2  (abort with ^G)
> 1> Apache CouchDB 1.2.0aa777195-git (LogLevel=info) is starting.
> Apache CouchDB has started. Time to relax.
> [info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/
> 1> couch_file:test(1000, 30).
> multi writes of 30 binaries, each of size 1000 bytes, took 1920us
> batch write of 30 binaries, each of size 1000 bytes,  took 344us
> ok
> 2> 
> 2> couch_file:test(4000, 30).
> multi writes of 30 binaries, each of size 4000 bytes, took 2002us
> batch write of 30 binaries, each of size 4000 bytes,  took 700us
> ok
> 3> 
> One order of magnitude less is quite significant I would say.
> Lower response times are mostly noticeable when delayed_commits are set to true.
> Running a writes only test with this branch gave me:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902e544
> While with trunk I got:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902eb50
> These tests were done on Linux with ext4 (and OTP R14B01).
> However I'm still not 100% sure if this worth applying to trunk.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-1023) Batching writes of BTree nodes (when possible) and in the DB updater

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980504#action_12980504 ] 

Filipe Manana commented on COUCHDB-1023:
----------------------------------------

Hi Randall, no I wasn't aware of you're experiment.

Just quick looking at it, the main difference seems that yours does an extra map/fold to each key tree and then maps each document to the respective summary.

As for the term_to_binary before a gen_server call, I don't think it offers any gain. Do you or anyone knows exactly what is more expensive: converting a term to a binary or copying a term?

And I don't think the complexity of adding a write-through cache is worth it: more code, more one server, and a new bottle neck possibly. For that I would use the delayed_writes option of the Erlang's file module.
But, i might be wrong. Some concrete implementation and benchmarks would definitely change my mind :)



> Batching writes of BTree nodes (when possible) and in the DB updater
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-1023
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1023
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Filipe Manana
>
> Recently I started experimenting with batching writes in the DB updater.
> For a test of 100 writers of 1Kb documents for e.g., most often the updater collects between 20 and 30 documents to write.
> Currently it does a file:write operation for each one. Not only this is slower, but it implies more context switches and stressing the OS/filesystem by allocating few blocks very often (since we use a pure file append write mode). The same can be done in the BTree node writes.
> The following branch/patch, is an experiment of batching writes:
> https://github.com/fdmanana/couchdb/compare/batch_writes
> In couch_file there's a quick test method that compares the time taken to write X blocks of size Y versus writing a single block of size X * Y.
> Example:
> Eshell V5.8.2  (abort with ^G)
> 1> Apache CouchDB 1.2.0aa777195-git (LogLevel=info) is starting.
> Apache CouchDB has started. Time to relax.
> [info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/
> 1> couch_file:test(1000, 30).
> multi writes of 30 binaries, each of size 1000 bytes, took 1920us
> batch write of 30 binaries, each of size 1000 bytes,  took 344us
> ok
> 2> 
> 2> couch_file:test(4000, 30).
> multi writes of 30 binaries, each of size 4000 bytes, took 2002us
> batch write of 30 binaries, each of size 4000 bytes,  took 700us
> ok
> 3> 
> One order of magnitude less is quite significant I would say.
> Lower response times are mostly noticeable when delayed_commits are set to true.
> Running a writes only test with this branch gave me:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902e544
> While with trunk I got:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902eb50
> These tests were done on Linux with ext4 (and OTP R14B01).
> However I'm still not 100% sure if this worth applying to trunk.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-1023) Batching writes of BTree nodes (when possible) and in the DB updater

Posted by "Randall Leeds (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/COUCHDB-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980498#action_12980498 ] 

Randall Leeds commented on COUCHDB-1023:
----------------------------------------

Didn't I do this work already and not notice any significant gains? I haven't looked to see if maybe you did it differently, but here's a version with an append_terms.

https://github.com/tilgovi/couchdb/tree/realbatchwrite

I also have branches and patches where I experimented with other ways of changing this code path. I do like calling term_to_binary before the gen_server:call to couch_file because it should avoid a copy operation, but you have to consider what other path you're slowing down as a result.

If it's not getting too complex for no gain my next thought would be to take the caching code you worked on before and use it to get a write-through cache that we flush asynchronously. The goal would be to let the updater do as much as possible up to the flush of the next commit group while the current one is being written. Something like this?

> Batching writes of BTree nodes (when possible) and in the DB updater
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-1023
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1023
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Filipe Manana
>
> Recently I started experimenting with batching writes in the DB updater.
> For a test of 100 writers of 1Kb documents for e.g., most often the updater collects between 20 and 30 documents to write.
> Currently it does a file:write operation for each one. Not only this is slower, but it implies more context switches and stressing the OS/filesystem by allocating few blocks very often (since we use a pure file append write mode). The same can be done in the BTree node writes.
> The following branch/patch, is an experiment of batching writes:
> https://github.com/fdmanana/couchdb/compare/batch_writes
> In couch_file there's a quick test method that compares the time taken to write X blocks of size Y versus writing a single block of size X * Y.
> Example:
> Eshell V5.8.2  (abort with ^G)
> 1> Apache CouchDB 1.2.0aa777195-git (LogLevel=info) is starting.
> Apache CouchDB has started. Time to relax.
> [info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/
> 1> couch_file:test(1000, 30).
> multi writes of 30 binaries, each of size 1000 bytes, took 1920us
> batch write of 30 binaries, each of size 1000 bytes,  took 344us
> ok
> 2> 
> 2> couch_file:test(4000, 30).
> multi writes of 30 binaries, each of size 4000 bytes, took 2002us
> batch write of 30 binaries, each of size 4000 bytes,  took 700us
> ok
> 3> 
> One order of magnitude less is quite significant I would say.
> Lower response times are mostly noticeable when delayed_commits are set to true.
> Running a writes only test with this branch gave me:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902e544
> While with trunk I got:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902eb50
> These tests were done on Linux with ext4 (and OTP R14B01).
> However I'm still not 100% sure if this worth applying to trunk.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.