You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Damien Katz <da...@apache.org> on 2009/01/05 18:54:58 UTC
Faster updates, optional ACID
It was brought to my attention that commits on OS X were very slow
with the latest releases of Erlang. After I upgraded to the most
recent version, I found them to be indeed slow, slowing the tests down
to point it was painful to run them. It appears, any disk sync from
Erlang now takes somewhere between 50 and 100ms, up from the previous
times of ~5ms. This is almost certainly due to the F_FULLFSYNC flag
the erlang file handling now uses on darwin based systems, but it's
surprising how bad performance on OS X. A little investigation had
shown other database engines have similar issues on OS X.
To address this problem, I implemented delayed commit functionality.
We had always intended to implement delayed commit for performance
reasons, but hadn't had the need until now. This makes updates much
faster in the general case, but with the caveat they aren't flushed
completely to disk right way. If you can't tolerate the possible loss
of recent updates, you can use the "full commit" option for ACID
commits.
For full acid commit, add a header field to the doc PUT or _bulk_docs
POST like this:
X-Couch-Full-Commit:true
Then couchdb will completely commit the change before returning.
Also, if you have several delayed updates and you want to make sure
they all made it to disk, you can invoke POST /db/_ensure_full_commit
and all outstanding commits are flushed to disk.
The view engine has been already modified to deal with delayed commits
too, it ensures it never fully commits it's own indexes to disk if the
documents indexed aren't already committed to disk.
The last remaining work item is db server crash detection, so that
clients can detect when a server has crashed and potentially lost
updates. This is pretty simple, each db server just needs a unique ID
generated at it's startup. Client retrieve this value at the beginning
of the writes and then checks that the value is the same once down a
flushed to disk. If not, we know we maybe have lost some updates and
we redo the replication from the last known good commit.
Right now the default is to delay the commits, because I think that
will be the most common use case but I'm really not sure. I definitely
want the commits delayed for the test suite, to keep things running
fast.
-Damien
Re: Faster updates, optional ACID
Posted by Jan Lehnardt <ja...@apache.org>.
On 6 Jan 2009, at 12:56, Noah Slater wrote:
> On Tue, Jan 06, 2009 at 12:52:26PM +0100, Jan Lehnardt wrote:
>>
>> On 6 Jan 2009, at 12:43, Noah Slater wrote:
>>
>>> On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
>>>> For full acid commit, add a header field to the doc PUT or
>>>> _bulk_docs
>>>> POST like this:
>>>> X-Couch-Full-Commit:true
>>>
>>> What is the cost/benefit of doing this via HTTP headers vs. query
>>> string?
>>
>> The HTTP headers way is implemented now. Damien said in another
>> mail, that
>> different options for this (per request, and db- and server-wide
>> config)
>> are on lower priority right now, but they will definitely come:
>
> Not sure I follow. This feature is already implemented? My question
> was about
> the per-request way of setting this option. Should it be done using
> HTTP headers
> or should it be done using a query string option?
Heh, I mean: The advantage is that the header version doesn't have to be
implemented, it exists. The query-parameter version doesn't. This is
not about
the usefulness of the query-param.
I'd say the header is "more correct" because we are describing request
semantics,
not content-options. But I see arguments for the query-parameter
version as well.
Cheers
Jan
--
Re: Faster updates, optional ACID
Posted by Noah Slater <ns...@apache.org>.
On Tue, Jan 06, 2009 at 12:52:26PM +0100, Jan Lehnardt wrote:
>
> On 6 Jan 2009, at 12:43, Noah Slater wrote:
>
>> On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
>>> For full acid commit, add a header field to the doc PUT or _bulk_docs
>>> POST like this:
>>> X-Couch-Full-Commit:true
>>
>> What is the cost/benefit of doing this via HTTP headers vs. query
>> string?
>
> The HTTP headers way is implemented now. Damien said in another mail, that
> different options for this (per request, and db- and server-wide config)
> are on lower priority right now, but they will definitely come:
Not sure I follow. This feature is already implemented? My question was about
the per-request way of setting this option. Should it be done using HTTP headers
or should it be done using a query string option?
--
Noah Slater, http://tumbolia.org/nslater
Re: Faster updates, optional ACID
Posted by Jan Lehnardt <ja...@apache.org>.
On 6 Jan 2009, at 12:43, Noah Slater wrote:
> On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
>> For full acid commit, add a header field to the doc PUT or _bulk_docs
>> POST like this:
>> X-Couch-Full-Commit:true
>
> What is the cost/benefit of doing this via HTTP headers vs. query
> string?
The HTTP headers way is implemented now. Damien said in another mail,
that different options for this (per request, and db- and server-wide
config)
are on lower priority right now, but they will definitely come:
"Definitely, I think commit options should be settable per-database.
But for
now I was just wanting to address the slowdown, especially for
replication
and the tests, to keep everyone productive. More commit features and
options is lower priority work for now, I was just addresses the most
serious
slowdown."
-- Damien in <D8...@apache.org>
Cheers
Jan
--
Re: Faster updates, optional ACID
Posted by Noah Slater <ns...@apache.org>.
On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
> For full acid commit, add a header field to the doc PUT or _bulk_docs
> POST like this:
> X-Couch-Full-Commit:true
What is the cost/benefit of doing this via HTTP headers vs. query string?
--
Noah Slater, http://tumbolia.org/nslater
Re: Faster updates, optional ACID
Posted by Jan Lehnardt <ja...@apache.org>.
On 5 Jan 2009, at 18:54, Damien Katz wrote:
> The last remaining work item is db server crash detection, so that
> clients can detect when a server has crashed and potentially lost
> updates. This is pretty simple, each db server just needs a unique
> ID generated at it's startup. Client retrieve this value at the
> beginning of the writes and then checks that the value is the same
> once down a flushed to disk. If not, we know we maybe have lost some
> updates and we redo the replication from the last known good commit.
We should document that pattern for clients that use CouchDB
and put in data without replication. Their writes might also not
go in.
> Right now the default is to delay the commits, because I think that
> will be the most common use case but I'm really not sure. I
> definitely want the commits delayed for the test suite, to keep
> things running fast.
+1
Cheers
Jan
--
Re: Faster updates, optional ACID
Posted by Noah Slater <ns...@apache.org>.
On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
> For full acid commit, add a header field to the doc PUT or _bulk_docs
> POST like this:
> X-Couch-Full-Commit:true
What is the cost/benefit of doing this via HTTP headers vs. query string?
--
Noah Slater, http://tumbolia.org/nslater
Re: Faster updates, optional ACID
Posted by Noah Slater <ns...@apache.org>.
On Mon, Jan 05, 2009 at 01:59:45PM -0500, Geir Magnusson Jr. wrote:
> That's cool but it puts the burden on the client - why not make it a
> config option, so that the db admin can choose the durability level in
> general, and let clients that know they are talking to couch override w/
> a header?
+1
--
Noah Slater, http://tumbolia.org/nslater
Re: Faster updates, optional ACID
Posted by Noah Slater <ns...@apache.org>.
On Mon, Jan 05, 2009 at 01:59:45PM -0500, Geir Magnusson Jr. wrote:
> That's cool but it puts the burden on the client - why not make it a
> config option, so that the db admin can choose the durability level in
> general, and let clients that know they are talking to couch override w/
> a header?
+1
--
Noah Slater, http://tumbolia.org/nslater
Re: Faster updates, optional ACID
Posted by Damien Katz <da...@apache.org>.
On Jan 5, 2009, at 2:38 PM, Chris Anderson wrote:
> On Mon, Jan 5, 2009 at 11:32 AM, Damien Katz <da...@apache.org>
> wrote:
>>>
>>> 1) delayed commit (what you did last night)
>>> 2) fsync() commit (what I suspect Couch did on and around 0.8)
>>> 3) optional F_FULLSYNC commit, on OS X and any other platform that
>>> provides this level of commit
>>>
>>
>> If necessary and possible, we'll patch the Erlang VM. But if a
>> platform
>> doesn't support proper flushing, then it's not a platform that can
>> support
>> an ACID database.
>>
>
> If I follow correctly, you're saying that on OS X, 2 == 1, so we use 3
> when we want to be ACID. On Linux, as far as we know, 2 will support
> ACID. There's only really two options, ACID commit, and delayed
> commit. It's up to CouchDB / Erlang / the OS to make sure that the
> ACID option isn't false advertising. Makes sense.
If fsync doesn't work on a platform (like previously on OS X), then
neither the delayed commit or fully flushed commit can work.
The way the delayed commit works, the database on disk is never in an
inconsistent state. The new data may be partially written and the
header in memory points to it, but the header on disk still points to
the old consistent data. So this new option isn't like OS X before the
F_FULLFSYNC patch, because that could actually cause corruptions
(header points to incomplete structures) and you'd lose previously
committed data. This new patch might lose the last uncommitted writes
if it crashes before the final flush, but the previously committed
data is always consistent and instantly available, no fixup needed.
-Damien
>
>
>
>> Definitely, I think commit options should be settable per-database.
>> But for
>> now I was just wanting to address the slowdown, especially for
>> replication
>> and the tests, to keep everyone productive. More commit features
>> and options
>> is lower priority work for now, I was just addresses the most serious
>> slowdown.
>
> The config system is so flexible and easy to hook into, that I'm not
> worried about adding these settings. Should be a breeze once the
> software itself is settled.
>
> +1 to the whole thing.
>
> --
> Chris Anderson
> http://jchris.mfdz.com
Re: Faster updates, optional ACID
Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
On Jan 5, 2009, at 2:38 PM, Chris Anderson wrote:
> On Mon, Jan 5, 2009 at 11:32 AM, Damien Katz <da...@apache.org>
> wrote:
>>>
>>> 1) delayed commit (what you did last night)
>>> 2) fsync() commit (what I suspect Couch did on and around 0.8)
>>> 3) optional F_FULLSYNC commit, on OS X and any other platform that
>>> provides this level of commit
>>>
>>
>> If necessary and possible, we'll patch the Erlang VM. But if a
>> platform
>> doesn't support proper flushing, then it's not a platform that can
>> support
>> an ACID database.
>>
>
> If I follow correctly, you're saying that on OS X, 2 == 1, so we use 3
> when we want to be ACID. On Linux, as far as we know, 2 will support
> ACID. There's only really two options, ACID commit, and delayed
> commit. It's up to CouchDB / Erlang / the OS to make sure that the
> ACID option isn't false advertising. Makes sense.
[Disclaimer : I'm not an OS filesystem expert]
Except AFAIK that's not what is happening. fsync() on OS X or linux
doesn't support the 'deterministic commit to physical media', which is
what I think you think you are getting.
As far as I can tell, only OS X gives you that full durability via
fcntl(F_FULLSYNC), and it's horribly slow.
That's why I'm proposing three distinct modes (at least).
geir
Re: Faster updates, optional ACID
Posted by Chris Anderson <jc...@gmail.com>.
On Mon, Jan 5, 2009 at 11:32 AM, Damien Katz <da...@apache.org> wrote:
>>
>> 1) delayed commit (what you did last night)
>> 2) fsync() commit (what I suspect Couch did on and around 0.8)
>> 3) optional F_FULLSYNC commit, on OS X and any other platform that
>> provides this level of commit
>>
>
> If necessary and possible, we'll patch the Erlang VM. But if a platform
> doesn't support proper flushing, then it's not a platform that can support
> an ACID database.
>
If I follow correctly, you're saying that on OS X, 2 == 1, so we use 3
when we want to be ACID. On Linux, as far as we know, 2 will support
ACID. There's only really two options, ACID commit, and delayed
commit. It's up to CouchDB / Erlang / the OS to make sure that the
ACID option isn't false advertising. Makes sense.
> Definitely, I think commit options should be settable per-database. But for
> now I was just wanting to address the slowdown, especially for replication
> and the tests, to keep everyone productive. More commit features and options
> is lower priority work for now, I was just addresses the most serious
> slowdown.
The config system is so flexible and easy to hook into, that I'm not
worried about adding these settings. Should be a breeze once the
software itself is settled.
+1 to the whole thing.
--
Chris Anderson
http://jchris.mfdz.com
Re: Faster updates, optional ACID
Posted by Jan Lehnardt <ja...@apache.org>.
On 5 Jan 2009, at 20:51, Geir Magnusson Jr. wrote:
>
> fsync() on Linux and OS X flushes to disk. I'm not suggesting that
> it doesn't.
>
> What it doesn't do is flush the write caches *on* the disk unit
> itself (which is what F_FULLSYNC supposedly does, hence the sloth)
>
> IOW, with fsync(), there's no guarantee that the bits get written to
> the physical media. As far as the OS knows, the FS caches are
> flushed to the device, but the device may still be holding in it's
> own RAM.
We haven't gotten any data-loss reports on Linux systems
using fsync(). We did for Mac OS X prior to the F_FULLFSYNC
patch in Erlang.
> re the FULLFSYNC change, do you have the option to not have it used,
> but have fsync() used instead?
Nope, file:fsync() (Erlang) calls fsync() (C) on Linux and
fcntl(F_FULLFSYNC) (C) on Darwin-based OSes, this is
hardcoded.
>> If necessary and possible, we'll patch the Erlang VM.
>
> That seems like a bad idea to me - I'd think you'd want to stay out
> of the VM business.
We are pushing the Erlang VM in various ways. So far,
feedback was very welcome by the VM developers and
we were encouraged to keep this up. Note, that the code
of the VM is tightly controlled and patches are evaluated
thoroughly by Ericsson.
Cheers
Jan
--
Re: Faster updates, optional ACID
Posted by Randall Leeds <ra...@gmail.com>.
IIRC I saw something about OS X and this flush issue before. It came up when
I was talking to an FS person at Apple.
I believe that the explanation was that hardware manufacturers often lie
about the flush call to gain better benchmark scores. The way apple ensures
that the full sync is actually completely is to stuff the buffer of the disk
with nonsense after the write to ensure that the real data has been pushed
out.
Something similar could without a doubt be done on Linux, but I don't know
that the OS handles it in any exposed way yet. I'm not familiar enough with
the appropriate linux syscalls, but perhaps you could patch the erlang VM to
do something similar to OS X. Query and discover the size of the write
buffer on the hardware and write that much garbage in order to flush.
IMO this is an ugly software hack to patch a hardware problem. If you
_absolutely need_ the full flushing you should figure out what hard drive
manufacturers don't produce this sort of flawed fsync behavior or get
something like a battery backed raid. Unfortunately, we're stuck in a hard
place as a result of silly, competitive benchmark races by hard disk
manufacturers.
-Randall
On Mon, Jan 5, 2009 at 15:04, Damien Katz <da...@apache.org> wrote:
>
> On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:
>
>
>> On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
>>
>>
>>> If necessary and possible, we'll patch the Erlang VM.
>>>
>>
>> That seems like a bad idea to me - I'd think you'd want to stay out of the
>> VM business.
>>
>
> No, I mean send patches to the maintainers of Erlang to fix any problems on
> their supported platforms. Just like the F_FULLFSYNC patch.
>
>
>>
>> But if a platform doesn't support proper flushing, then it's not a
>>> platform that can support an ACID database.
>>>
>>
>> We're not communicating well here.
>>
>> "proper flushing" depends on what you want to do - if you need your data
>> to in confirmed permanent storage so that it can survive a crash or power
>> cut, then w/o special configuration (e.g. battery-backed RAID, for example),
>> I don't think that you're going to get assurance on linux.
>>
>> Do you see what I'm saying?
>>
>>
> Yes I see what you are saying. Can you show that Linux doesn't actually
> safely push the bits to disk in popular distros? If that's the case, then we
> need to find the APIs that actually work and call them, and if they don't
> work, we don't support Linux.
>
>
>>> why not make it a config option, so that the db admin can choose the
>>>> durability level in general, and let clients that know they are talking to
>>>> couch override w/ a header?
>>>>
>>>>
>>> Definitely, I think commit options should be settable per-database. But
>>> for now I was just wanting to address the slowdown, especially for
>>> replication and the tests, to keep everyone productive. More commit features
>>> and options is lower priority work for now, I was just addresses the most
>>> serious slowdown.
>>>
>>
>> That makes sense, but IMO you papered over the root problem.
>> It's good to keep people working, but I think the issue deserves a look.
>> I don't know erlang, or I would look myself.
>>
>
> What issue? Why do you think this is Erlang specific?
>
>
>
>> geir
>>
>
>
Re: Faster updates, optional ACID
Posted by Lawrence Pit <la...@gmail.com>.
Chris Anderson asked me this earlier on the user list:
> Quick question: was this version of CouchDB built freshly against the
> latest Erlang? We've noticed that some performance changes on an
> Erlang upgrade don't appear until after Couch is rebuilt.
Given your results below I started thinking my memory wasn't serving me
right probably... usually I build from source, and I'm pretty sure I did
with couchDB, but now also remember I /tried/ installing it from darwin
ports or something similar, pre-compiled stuff, which, to my best
recollection, didn't work.
I don't know erlang that well..... could it be possible that the sources
I have for 0.8 were compiled with R12B-3, which when run under R12B-5
give good results, while if those sources are compiled with R12B-5 give
bad results?
Cheers,
Lawrence
> For completeness sake:
>
> The previous results were against Erlang R12B-5.
>
> Here's R12B-3 which doesn't have the fsync() fix:
>
> CouchDB 0.8.0:
> Requests per second: 184.42 [#/sec] (mean)
>
>
> CouchDB 0.8.1:
> Requests per second: 185.74 [#/sec] (mean)
>
>
> CouchDB trunk r731451 (pre-async-commit-patch):
> Requests per second: 199.30 [#/sec] (mean)
>
> Cheers
> Jan
> --
>
> On 6 Jan 2009, at 16:10, Jan Lehnardt wrote:
>
>>
>> On 6 Jan 2009, at 14:56, Lawrence Pit wrote:
>>
>>> Interesting indeed. I was seeing:
>>>
>>> CouchDB/0.8.0-incubating
>>>
>>> I assume that is different from CouchDB 0.8.1 ?
>>
>>
>> CouchDB 0.8.0:
>>
>> Requests per second: 5.29 [#/sec] (mean)
>>
>> Cheers
>> Jan
>> --
>>
>>>
>>> Cheers,
>>> Lawrence
>>>
>>>> Interesting. I wonder that Lawrence was seeing...
>>>>
>>>> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>>>>
>>>>>
>>>>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>>>>
>>>>>> It was reported that w/ the same up-to-date version of erlang,
>>>>>> they found a big performance difference between 0.8 and current
>>>>>> trunk. If that's true, then it seems to me that something
>>>>>> changed in the filesystem handling in the CouchDB code itself -
>>>>>> it could be that there are multiple flush modes, and the 0.8 code
>>>>>> used whatever corresponds to fsync(), and trunk uses whatever
>>>>>> corresponds to fnctl(F_FULLSYNC). I don't know It's a guess.
>>>>>> But yesterdays results are unexplained, and I hate mysteries.
>>>>>
>>>>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json'
>>>>> http://127.0.0.1:5984/test_suite_db
>>>>>
>>>>> CouchDB 0.8.1:
>>>>> Requests per second: 6.56 [#/sec] (mean)
>>>>>
>>>>> CouchDB trunk r731451 (pre-async-commit-patch):
>>>>> Requests per second: 5.94 [#/sec] (mean)
>>>>>
>>>>>
>>>>> Cheers
>>>>> Jan
>>>>> --
>>>>
>>>>
>>>
>>>
>>
>>
>
>
Re: Faster updates, optional ACID
Posted by Jan Lehnardt <ja...@apache.org>.
For completeness sake:
The previous results were against Erlang R12B-5.
Here's R12B-3 which doesn't have the fsync() fix:
CouchDB 0.8.0:
Requests per second: 184.42 [#/sec] (mean)
CouchDB 0.8.1:
Requests per second: 185.74 [#/sec] (mean)
CouchDB trunk r731451 (pre-async-commit-patch):
Requests per second: 199.30 [#/sec] (mean)
Cheers
Jan
--
On 6 Jan 2009, at 16:10, Jan Lehnardt wrote:
>
> On 6 Jan 2009, at 14:56, Lawrence Pit wrote:
>
>> Interesting indeed. I was seeing:
>>
>> CouchDB/0.8.0-incubating
>>
>> I assume that is different from CouchDB 0.8.1 ?
>
>
> CouchDB 0.8.0:
>
> Requests per second: 5.29 [#/sec] (mean)
>
> Cheers
> Jan
> --
>
>>
>> Cheers,
>> Lawrence
>>
>>> Interesting. I wonder that Lawrence was seeing...
>>>
>>> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>>>
>>>>
>>>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>>>
>>>>> It was reported that w/ the same up-to-date version of erlang,
>>>>> they found a big performance difference between 0.8 and current
>>>>> trunk. If that's true, then it seems to me that something
>>>>> changed in the filesystem handling in the CouchDB code itself -
>>>>> it could be that there are multiple flush modes, and the 0.8
>>>>> code used whatever corresponds to fsync(), and trunk uses
>>>>> whatever corresponds to fnctl(F_FULLSYNC). I don't know It's a
>>>>> guess. But yesterdays results are unexplained, and I hate
>>>>> mysteries.
>>>>
>>>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json' http://127.0.0.1:5984/test_suite_db
>>>>
>>>> CouchDB 0.8.1:
>>>> Requests per second: 6.56 [#/sec] (mean)
>>>>
>>>> CouchDB trunk r731451 (pre-async-commit-patch):
>>>> Requests per second: 5.94 [#/sec] (mean)
>>>>
>>>>
>>>> Cheers
>>>> Jan
>>>> --
>>>
>>>
>>
>>
>
>
Re: Faster updates, optional ACID
Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
the mystery gets better...
On Jan 6, 2009, at 10:10 AM, Jan Lehnardt wrote:
>
> On 6 Jan 2009, at 14:56, Lawrence Pit wrote:
>
>> Interesting indeed. I was seeing:
>>
>> CouchDB/0.8.0-incubating
>>
>> I assume that is different from CouchDB 0.8.1 ?
>
>
> CouchDB 0.8.0:
>
> Requests per second: 5.29 [#/sec] (mean)
>
> Cheers
> Jan
> --
>
>>
>> Cheers,
>> Lawrence
>>
>>> Interesting. I wonder that Lawrence was seeing...
>>>
>>> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>>>
>>>>
>>>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>>>
>>>>> It was reported that w/ the same up-to-date version of erlang,
>>>>> they found a big performance difference between 0.8 and current
>>>>> trunk. If that's true, then it seems to me that something
>>>>> changed in the filesystem handling in the CouchDB code itself -
>>>>> it could be that there are multiple flush modes, and the 0.8
>>>>> code used whatever corresponds to fsync(), and trunk uses
>>>>> whatever corresponds to fnctl(F_FULLSYNC). I don't know It's a
>>>>> guess. But yesterdays results are unexplained, and I hate
>>>>> mysteries.
>>>>
>>>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json' http://127.0.0.1:5984/test_suite_db
>>>>
>>>> CouchDB 0.8.1:
>>>> Requests per second: 6.56 [#/sec] (mean)
>>>>
>>>> CouchDB trunk r731451 (pre-async-commit-patch):
>>>> Requests per second: 5.94 [#/sec] (mean)
>>>>
>>>>
>>>> Cheers
>>>> Jan
>>>> --
>>>
>>>
>>
>>
>
Re: Faster updates, optional ACID
Posted by Jan Lehnardt <ja...@apache.org>.
On 6 Jan 2009, at 14:56, Lawrence Pit wrote:
> Interesting indeed. I was seeing:
>
> CouchDB/0.8.0-incubating
>
> I assume that is different from CouchDB 0.8.1 ?
CouchDB 0.8.0:
Requests per second: 5.29 [#/sec] (mean)
Cheers
Jan
--
>
> Cheers,
> Lawrence
>
>> Interesting. I wonder that Lawrence was seeing...
>>
>> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>>
>>>
>>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>>
>>>> It was reported that w/ the same up-to-date version of erlang,
>>>> they found a big performance difference between 0.8 and current
>>>> trunk. If that's true, then it seems to me that something
>>>> changed in the filesystem handling in the CouchDB code itself -
>>>> it could be that there are multiple flush modes, and the 0.8 code
>>>> used whatever corresponds to fsync(), and trunk uses whatever
>>>> corresponds to fnctl(F_FULLSYNC). I don't know It's a guess.
>>>> But yesterdays results are unexplained, and I hate mysteries.
>>>
>>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json' http://127.0.0.1:5984/test_suite_db
>>>
>>> CouchDB 0.8.1:
>>> Requests per second: 6.56 [#/sec] (mean)
>>>
>>> CouchDB trunk r731451 (pre-async-commit-patch):
>>> Requests per second: 5.94 [#/sec] (mean)
>>>
>>>
>>> Cheers
>>> Jan
>>> --
>>
>>
>
>
Re: Faster updates, optional ACID
Posted by Lawrence Pit <la...@gmail.com>.
Interesting indeed. I was seeing:
CouchDB/0.8.0-incubating
I assume that is different from CouchDB 0.8.1 ?
Cheers,
Lawrence
> Interesting. I wonder that Lawrence was seeing...
>
> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>
>>
>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>
>>> It was reported that w/ the same up-to-date version of erlang, they
>>> found a big performance difference between 0.8 and current trunk.
>>> If that's true, then it seems to me that something changed in the
>>> filesystem handling in the CouchDB code itself - it could be that
>>> there are multiple flush modes, and the 0.8 code used whatever
>>> corresponds to fsync(), and trunk uses whatever corresponds to
>>> fnctl(F_FULLSYNC). I don't know It's a guess. But yesterdays
>>> results are unexplained, and I hate mysteries.
>>
>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json'
>> http://127.0.0.1:5984/test_suite_db
>>
>> CouchDB 0.8.1:
>> Requests per second: 6.56 [#/sec] (mean)
>>
>> CouchDB trunk r731451 (pre-async-commit-patch):
>> Requests per second: 5.94 [#/sec] (mean)
>>
>>
>> Cheers
>> Jan
>> --
>
>
Re: Faster updates, optional ACID
Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
Interesting. I wonder that Lawrence was seeing...
On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>
> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>
>> It was reported that w/ the same up-to-date version of erlang, they
>> found a big performance difference between 0.8 and current trunk.
>> If that's true, then it seems to me that something changed in the
>> filesystem handling in the CouchDB code itself - it could be that
>> there are multiple flush modes, and the 0.8 code used whatever
>> corresponds to fsync(), and trunk uses whatever corresponds to
>> fnctl(F_FULLSYNC). I don't know It's a guess. But yesterdays
>> results are unexplained, and I hate mysteries.
>
> $ ab -c 10 -n 1000 -p emptypost -T 'application/json' http://127.0.0.1:5984/test_suite_db
>
> CouchDB 0.8.1:
> Requests per second: 6.56 [#/sec] (mean)
>
> CouchDB trunk r731451 (pre-async-commit-patch):
> Requests per second: 5.94 [#/sec] (mean)
>
>
> Cheers
> Jan
> --
Re: Faster updates, optional ACID
Posted by Jan Lehnardt <ja...@apache.org>.
On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>
> It was reported that w/ the same up-to-date version of erlang, they
> found a big performance difference between 0.8 and current trunk.
> If that's true, then it seems to me that something changed in the
> filesystem handling in the CouchDB code itself - it could be that
> there are multiple flush modes, and the 0.8 code used whatever
> corresponds to fsync(), and trunk uses whatever corresponds to
> fnctl(F_FULLSYNC). I don't know It's a guess. But yesterdays
> results are unexplained, and I hate mysteries.
$ ab -c 10 -n 1000 -p emptypost -T 'application/json' http://127.0.0.1:5984/test_suite_db
CouchDB 0.8.1:
Requests per second: 6.56 [#/sec] (mean)
CouchDB trunk r731451 (pre-async-commit-patch):
Requests per second: 5.94 [#/sec] (mean)
Cheers
Jan
--
Re: Faster updates, optional ACID
Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
On Jan 5, 2009, at 3:04 PM, Damien Katz wrote:
>
> On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:
>
>>
>> On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
>>
>>>
>>> If necessary and possible, we'll patch the Erlang VM.
>>
>> That seems like a bad idea to me - I'd think you'd want to stay out
>> of the VM business.
>
> No, I mean send patches to the maintainers of Erlang to fix any
> problems on their supported platforms. Just like the F_FULLFSYNC
> patch.
Ah. Whew :)
>>
>>
>>> But if a platform doesn't support proper flushing, then it's not a
>>> platform that can support an ACID database.
>>
>> We're not communicating well here.
>>
>> "proper flushing" depends on what you want to do - if you need your
>> data to in confirmed permanent storage so that it can survive a
>> crash or power cut, then w/o special configuration (e.g. battery-
>> backed RAID, for example), I don't think that you're going to get
>> assurance on linux.
>>
>> Do you see what I'm saying?
>>
>
> Yes I see what you are saying. Can you show that Linux doesn't
> actually safely push the bits to disk in popular distros? If that's
> the case, then we need to find the APIs that actually work and call
> them, and if they don't work, we don't support Linux.
It pushes the bits to the disk drive, but that's where it's sphere of
effect ends - what the drive does after that is drive specific.
Drives cache writes to aggregate, or write things out of order based
on head location, etc.
This isn't something that only affects Couch.
So I would say that .... it's time to relax.
Take the approach that you have a few modes
a) fsync() mode - for people that care about true durability, it's
up to them to get or configure drives to behave right, or whatever
b) the delayed write mode so that you can do things like aggregate
writes into clocked fsyncs or something (I'd use this - I'll take the
performance trade for durability)
c) and for platforms that offer special modes that really do
guarantee the write all the way to the physical media, like OS X's
fcntl(F_FULLSYNC), make that an option too.
>>>
>>>> why not make it a config option, so that the db admin can choose
>>>> the durability level in general, and let clients that know they
>>>> are talking to couch override w/ a header?
>>>>
>>>
>>> Definitely, I think commit options should be settable per-
>>> database. But for now I was just wanting to address the slowdown,
>>> especially for replication and the tests, to keep everyone
>>> productive. More commit features and options is lower priority
>>> work for now, I was just addresses the most serious slowdown.
>>
>> That makes sense, but IMO you papered over the root problem.
>> It's good to keep people working, but I think the issue deserves a
>> look. I don't know erlang, or I would look myself.
>
> What issue? Why do you think this is Erlang specific?
Oh - this is a SWAG based on one data point :) [it was a rough day -
I didn't get to try to duplicate the results found yesterday...]
http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
It was reported that w/ the same up-to-date version of erlang, they
found a big performance difference between 0.8 and current trunk. If
that's true, then it seems to me that something changed in the
filesystem handling in the CouchDB code itself - it could be that
there are multiple flush modes, and the 0.8 code used whatever
corresponds to fsync(), and trunk uses whatever corresponds to
fnctl(F_FULLSYNC). I don't know It's a guess. But yesterdays
results are unexplained, and I hate mysteries.
I can't help with the erlang (I don't know it...), but I can at least
try to reproduce the results...
geir
Re: Faster updates, optional ACID
Posted by Damien Katz <da...@apache.org>.
On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:
>
> On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
>
>>
>> If necessary and possible, we'll patch the Erlang VM.
>
> That seems like a bad idea to me - I'd think you'd want to stay out
> of the VM business.
No, I mean send patches to the maintainers of Erlang to fix any
problems on their supported platforms. Just like the F_FULLFSYNC patch.
>
>
>> But if a platform doesn't support proper flushing, then it's not a
>> platform that can support an ACID database.
>
> We're not communicating well here.
>
> "proper flushing" depends on what you want to do - if you need your
> data to in confirmed permanent storage so that it can survive a
> crash or power cut, then w/o special configuration (e.g. battery-
> backed RAID, for example), I don't think that you're going to get
> assurance on linux.
>
> Do you see what I'm saying?
>
Yes I see what you are saying. Can you show that Linux doesn't
actually safely push the bits to disk in popular distros? If that's
the case, then we need to find the APIs that actually work and call
them, and if they don't work, we don't support Linux.
>>
>>> why not make it a config option, so that the db admin can choose
>>> the durability level in general, and let clients that know they
>>> are talking to couch override w/ a header?
>>>
>>
>> Definitely, I think commit options should be settable per-database.
>> But for now I was just wanting to address the slowdown, especially
>> for replication and the tests, to keep everyone productive. More
>> commit features and options is lower priority work for now, I was
>> just addresses the most serious slowdown.
>
> That makes sense, but IMO you papered over the root problem.
> It's good to keep people working, but I think the issue deserves a
> look. I don't know erlang, or I would look myself.
What issue? Why do you think this is Erlang specific?
>
> geir
Re: Faster updates, optional ACID
Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
>
> On Jan 5, 2009, at 1:59 PM, Geir Magnusson Jr. wrote:
>
>>
>> On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:
>>
>>> It was brought to my attention that commits on OS X were very slow
>>> with the latest releases of Erlang. After I upgraded to the most
>>> recent version, I found them to be indeed slow, slowing the tests
>>> down to point it was painful to run them. It appears, any disk
>>> sync from Erlang now takes somewhere between 50 and 100ms, up from
>>> the previous times of ~5ms. This is almost certainly due to the
>>> F_FULLFSYNC flag the erlang file handling now uses on darwin based
>>> systems, but it's surprising how bad performance on OS X. A little
>>> investigation had shown other database engines have similar issues
>>> on OS X.
>>
>> One user, though, reported a huge performance difference between
>> 0.8 and trunk *using the same version of Erlang*.
>>
>> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
>>
>> To me, this hints that something changed in CouchDB that triggers
>> usage of fcntl(F_FULLFSYNC).
>>
>> I haven't tried to duplicate but will. If this can be verified, I
>> think that this is an important mystery to solve.
>>
>>>
>>>
>>> To address this problem, I implemented delayed commit
>>> functionality. We had always intended to implement delayed commit
>>> for performance reasons, but hadn't had the need until now. This
>>> makes updates much faster in the general case, but with the caveat
>>> they aren't flushed completely to disk right way. If you can't
>>> tolerate the possible loss of recent updates, you can use the
>>> "full commit" option for ACID commits.
>>
>> So this bring up a question. There is no way to get the same
>> durability semantics on linux as you can get on OS X with F_FULLSYNC.
>
> On linux, as always it depends (on distro, file system, etc), but
> generally fsync flushes to disk, or so I've been told by those who
> should know and I've not seen credible evidence otherwise. But if
> fsync is broken by default on Linux (like say Debian based distros),
> file a bug and we'll see about get Erlang patched with the proper
> apis (the Erlang F_FULLFSYNC change was from us too).
fsync() on Linux and OS X flushes to disk. I'm not suggesting that it
doesn't.
What it doesn't do is flush the write caches *on* the disk unit itself
(which is what F_FULLSYNC supposedly does, hence the sloth)
IOW, with fsync(), there's no guarantee that the bits get written to
the physical media. As far as the OS knows, the FS caches are flushed
to the device, but the device may still be holding in it's own RAM.
re the FULLFSYNC change, do you have the option to not have it used,
but have fsync() used instead?
>
>
>> This means that the "full commit" option really gives you different
>> levels of durability, depending on whether or not you are on OS X.
>>
>> And thinking more about what appears to be the perf bug/slowdown in
>> CouchDB code, might his warrant three options?
>>
>> 1) delayed commit (what you did last night)
>> 2) fsync() commit (what I suspect Couch did on and around 0.8)
>> 3) optional F_FULLSYNC commit, on OS X and any other platform that
>> provides this level of commit
>>
>
> If necessary and possible, we'll patch the Erlang VM.
That seems like a bad idea to me - I'd think you'd want to stay out of
the VM business.
> But if a platform doesn't support proper flushing, then it's not a
> platform that can support an ACID database.
We're not communicating well here.
"proper flushing" depends on what you want to do - if you need your
data to in confirmed permanent storage so that it can survive a crash
or power cut, then w/o special configuration (e.g. battery-backed
RAID, for example), I don't think that you're going to get assurance
on linux.
Do you see what I'm saying?
>
>
>>>
>>>
>>> For full acid commit, add a header field to the doc PUT or
>>> _bulk_docs POST like this:
>>> X-Couch-Full-Commit:true
>>>
>>> Then couchdb will completely commit the change before returning.
>>
>>
>> That's cool but it puts the burden on the client -
>
> True, but they already have the API they must conform too, I don't
> see this option as being particularly burdensome unless it's simply
> the wrong default.
But it keeps adding requirements to the API that aren't really in the
application domain necessarily.
>
>
>> why not make it a config option, so that the db admin can choose
>> the durability level in general, and let clients that know they are
>> talking to couch override w/ a header?
>>
>
> Definitely, I think commit options should be settable per-database.
> But for now I was just wanting to address the slowdown, especially
> for replication and the tests, to keep everyone productive. More
> commit features and options is lower priority work for now, I was
> just addresses the most serious slowdown.
That makes sense, but IMO you papered over the root problem. It's
good to keep people working, but I think the issue deserves a look. I
don't know erlang, or I would look myself.
geir
>
>
> Also, having the default be delayed commit will help us flush out
> any problems, especially for usability and real productioning
> testing. If it's broken, either a simple bug or by design, we need
> to know it as soon as possible.
>
> -Damien
>
>>
>> geir
>>
>>>
>>>
>>> Also, if you have several delayed updates and you want to make
>>> sure they all made it to disk, you can invoke POST /db/
>>> _ensure_full_commit and all outstanding commits are flushed to disk.
>>>
>>> The view engine has been already modified to deal with delayed
>>> commits too, it ensures it never fully commits it's own indexes to
>>> disk if the documents indexed aren't already committed to disk.
>>>
>>> The last remaining work item is db server crash detection, so that
>>> clients can detect when a server has crashed and potentially lost
>>> updates. This is pretty simple, each db server just needs a unique
>>> ID generated at it's startup. Client retrieve this value at the
>>> beginning of the writes and then checks that the value is the same
>>> once down a flushed to disk. If not, we know we maybe have lost
>>> some updates and we redo the replication from the last known good
>>> commit.
>>>
>>> Right now the default is to delay the commits, because I think
>>> that will be the most common use case but I'm really not sure. I
>>> definitely want the commits delayed for the test suite, to keep
>>> things running fast.
>>>
>>> -Damien
>>>
>>>
>>
>
Re: Faster updates, optional ACID
Posted by Damien Katz <da...@apache.org>.
On Jan 5, 2009, at 1:59 PM, Geir Magnusson Jr. wrote:
>
> On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:
>
>> It was brought to my attention that commits on OS X were very slow
>> with the latest releases of Erlang. After I upgraded to the most
>> recent version, I found them to be indeed slow, slowing the tests
>> down to point it was painful to run them. It appears, any disk sync
>> from Erlang now takes somewhere between 50 and 100ms, up from the
>> previous times of ~5ms. This is almost certainly due to the
>> F_FULLFSYNC flag the erlang file handling now uses on darwin based
>> systems, but it's surprising how bad performance on OS X. A little
>> investigation had shown other database engines have similar issues
>> on OS X.
>
> One user, though, reported a huge performance difference between
> 0.8 and trunk *using the same version of Erlang*.
>
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
>
> To me, this hints that something changed in CouchDB that triggers
> usage of fcntl(F_FULLFSYNC).
>
> I haven't tried to duplicate but will. If this can be verified, I
> think that this is an important mystery to solve.
>
>>
>>
>> To address this problem, I implemented delayed commit
>> functionality. We had always intended to implement delayed commit
>> for performance reasons, but hadn't had the need until now. This
>> makes updates much faster in the general case, but with the caveat
>> they aren't flushed completely to disk right way. If you can't
>> tolerate the possible loss of recent updates, you can use the "full
>> commit" option for ACID commits.
>
> So this bring up a question. There is no way to get the same
> durability semantics on linux as you can get on OS X with F_FULLSYNC.
On linux, as always it depends (on distro, file system, etc), but
generally fsync flushes to disk, or so I've been told by those who
should know and I've not seen credible evidence otherwise. But if
fsync is broken by default on Linux (like say Debian based distros),
file a bug and we'll see about get Erlang patched with the proper apis
(the Erlang F_FULLFSYNC change was from us too).
> This means that the "full commit" option really gives you different
> levels of durability, depending on whether or not you are on OS X.
>
> And thinking more about what appears to be the perf bug/slowdown in
> CouchDB code, might his warrant three options?
>
> 1) delayed commit (what you did last night)
> 2) fsync() commit (what I suspect Couch did on and around 0.8)
> 3) optional F_FULLSYNC commit, on OS X and any other platform that
> provides this level of commit
>
If necessary and possible, we'll patch the Erlang VM. But if a
platform doesn't support proper flushing, then it's not a platform
that can support an ACID database.
>>
>>
>> For full acid commit, add a header field to the doc PUT or
>> _bulk_docs POST like this:
>> X-Couch-Full-Commit:true
>>
>> Then couchdb will completely commit the change before returning.
>
>
> That's cool but it puts the burden on the client -
True, but they already have the API they must conform too, I don't see
this option as being particularly burdensome unless it's simply the
wrong default.
> why not make it a config option, so that the db admin can choose the
> durability level in general, and let clients that know they are
> talking to couch override w/ a header?
>
Definitely, I think commit options should be settable per-database.
But for now I was just wanting to address the slowdown, especially for
replication and the tests, to keep everyone productive. More commit
features and options is lower priority work for now, I was just
addresses the most serious slowdown.
Also, having the default be delayed commit will help us flush out any
problems, especially for usability and real productioning testing. If
it's broken, either a simple bug or by design, we need to know it as
soon as possible.
-Damien
>
> geir
>
>>
>>
>> Also, if you have several delayed updates and you want to make sure
>> they all made it to disk, you can invoke POST /db/
>> _ensure_full_commit and all outstanding commits are flushed to disk.
>>
>> The view engine has been already modified to deal with delayed
>> commits too, it ensures it never fully commits it's own indexes to
>> disk if the documents indexed aren't already committed to disk.
>>
>> The last remaining work item is db server crash detection, so that
>> clients can detect when a server has crashed and potentially lost
>> updates. This is pretty simple, each db server just needs a unique
>> ID generated at it's startup. Client retrieve this value at the
>> beginning of the writes and then checks that the value is the same
>> once down a flushed to disk. If not, we know we maybe have lost
>> some updates and we redo the replication from the last known good
>> commit.
>>
>> Right now the default is to delay the commits, because I think that
>> will be the most common use case but I'm really not sure. I
>> definitely want the commits delayed for the test suite, to keep
>> things running fast.
>>
>> -Damien
>>
>>
>
Re: Faster updates, optional ACID
Posted by Damien Katz <da...@apache.org>.
On Jan 5, 2009, at 1:59 PM, Geir Magnusson Jr. wrote:
>
> On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:
>
>> It was brought to my attention that commits on OS X were very slow
>> with the latest releases of Erlang. After I upgraded to the most
>> recent version, I found them to be indeed slow, slowing the tests
>> down to point it was painful to run them. It appears, any disk sync
>> from Erlang now takes somewhere between 50 and 100ms, up from the
>> previous times of ~5ms. This is almost certainly due to the
>> F_FULLFSYNC flag the erlang file handling now uses on darwin based
>> systems, but it's surprising how bad performance on OS X. A little
>> investigation had shown other database engines have similar issues
>> on OS X.
>
> One user, though, reported a huge performance difference between
> 0.8 and trunk *using the same version of Erlang*.
>
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
>
> To me, this hints that something changed in CouchDB that triggers
> usage of fcntl(F_FULLFSYNC).
>
> I haven't tried to duplicate but will. If this can be verified, I
> think that this is an important mystery to solve.
>
>>
>>
>> To address this problem, I implemented delayed commit
>> functionality. We had always intended to implement delayed commit
>> for performance reasons, but hadn't had the need until now. This
>> makes updates much faster in the general case, but with the caveat
>> they aren't flushed completely to disk right way. If you can't
>> tolerate the possible loss of recent updates, you can use the "full
>> commit" option for ACID commits.
>
> So this bring up a question. There is no way to get the same
> durability semantics on linux as you can get on OS X with F_FULLSYNC.
On linux, as always it depends (on distro, file system, etc), but
generally fsync flushes to disk, or so I've been told by those who
should know and I've not seen credible evidence otherwise. But if
fsync is broken by default on Linux (like say Debian based distros),
file a bug and we'll see about get Erlang patched with the proper apis
(the Erlang F_FULLFSYNC change was from us too).
> This means that the "full commit" option really gives you different
> levels of durability, depending on whether or not you are on OS X.
>
> And thinking more about what appears to be the perf bug/slowdown in
> CouchDB code, might his warrant three options?
>
> 1) delayed commit (what you did last night)
> 2) fsync() commit (what I suspect Couch did on and around 0.8)
> 3) optional F_FULLSYNC commit, on OS X and any other platform that
> provides this level of commit
>
If necessary and possible, we'll patch the Erlang VM. But if a
platform doesn't support proper flushing, then it's not a platform
that can support an ACID database.
>>
>>
>> For full acid commit, add a header field to the doc PUT or
>> _bulk_docs POST like this:
>> X-Couch-Full-Commit:true
>>
>> Then couchdb will completely commit the change before returning.
>
>
> That's cool but it puts the burden on the client -
True, but they already have the API they must conform too, I don't see
this option as being particularly burdensome unless it's simply the
wrong default.
> why not make it a config option, so that the db admin can choose the
> durability level in general, and let clients that know they are
> talking to couch override w/ a header?
>
Definitely, I think commit options should be settable per-database.
But for now I was just wanting to address the slowdown, especially for
replication and the tests, to keep everyone productive. More commit
features and options is lower priority work for now, I was just
addresses the most serious slowdown.
Also, having the default be delayed commit will help us flush out any
problems, especially for usability and real productioning testing. If
it's broken, either a simple bug or by design, we need to know it as
soon as possible.
-Damien
>
> geir
>
>>
>>
>> Also, if you have several delayed updates and you want to make sure
>> they all made it to disk, you can invoke POST /db/
>> _ensure_full_commit and all outstanding commits are flushed to disk.
>>
>> The view engine has been already modified to deal with delayed
>> commits too, it ensures it never fully commits it's own indexes to
>> disk if the documents indexed aren't already committed to disk.
>>
>> The last remaining work item is db server crash detection, so that
>> clients can detect when a server has crashed and potentially lost
>> updates. This is pretty simple, each db server just needs a unique
>> ID generated at it's startup. Client retrieve this value at the
>> beginning of the writes and then checks that the value is the same
>> once down a flushed to disk. If not, we know we maybe have lost
>> some updates and we redo the replication from the last known good
>> commit.
>>
>> Right now the default is to delay the commits, because I think that
>> will be the most common use case but I'm really not sure. I
>> definitely want the commits delayed for the test suite, to keep
>> things running fast.
>>
>> -Damien
>>
>>
>
Re: Faster updates, optional ACID
Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:
> It was brought to my attention that commits on OS X were very slow
> with the latest releases of Erlang. After I upgraded to the most
> recent version, I found them to be indeed slow, slowing the tests
> down to point it was painful to run them. It appears, any disk sync
> from Erlang now takes somewhere between 50 and 100ms, up from the
> previous times of ~5ms. This is almost certainly due to the
> F_FULLFSYNC flag the erlang file handling now uses on darwin based
> systems, but it's surprising how bad performance on OS X. A little
> investigation had shown other database engines have similar issues
> on OS X.
One user, though, reported a huge performance difference between 0.8
and trunk *using the same version of Erlang*.
http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
To me, this hints that something changed in CouchDB that triggers
usage of fcntl(F_FULLFSYNC).
I haven't tried to duplicate but will. If this can be verified, I
think that this is an important mystery to solve.
>
>
> To address this problem, I implemented delayed commit functionality.
> We had always intended to implement delayed commit for performance
> reasons, but hadn't had the need until now. This makes updates much
> faster in the general case, but with the caveat they aren't flushed
> completely to disk right way. If you can't tolerate the possible
> loss of recent updates, you can use the "full commit" option for
> ACID commits.
So this bring up a question. There is no way to get the same
durability semantics on linux as you can get on OS X with F_FULLSYNC.
This means that the "full commit" option really gives you different
levels of durability, depending on whether or not you are on OS X.
And thinking more about what appears to be the perf bug/slowdown in
CouchDB code, might his warrant three options?
1) delayed commit (what you did last night)
2) fsync() commit (what I suspect Couch did on and around 0.8)
3) optional F_FULLSYNC commit, on OS X and any other platform that
provides this level of commit
>
>
> For full acid commit, add a header field to the doc PUT or
> _bulk_docs POST like this:
> X-Couch-Full-Commit:true
>
> Then couchdb will completely commit the change before returning.
That's cool but it puts the burden on the client - why not make it a
config option, so that the db admin can choose the durability level in
general, and let clients that know they are talking to couch override
w/ a header?
geir
>
>
> Also, if you have several delayed updates and you want to make sure
> they all made it to disk, you can invoke POST /db/
> _ensure_full_commit and all outstanding commits are flushed to disk.
>
> The view engine has been already modified to deal with delayed
> commits too, it ensures it never fully commits it's own indexes to
> disk if the documents indexed aren't already committed to disk.
>
> The last remaining work item is db server crash detection, so that
> clients can detect when a server has crashed and potentially lost
> updates. This is pretty simple, each db server just needs a unique
> ID generated at it's startup. Client retrieve this value at the
> beginning of the writes and then checks that the value is the same
> once down a flushed to disk. If not, we know we maybe have lost some
> updates and we redo the replication from the last known good commit.
>
> Right now the default is to delay the commits, because I think that
> will be the most common use case but I'm really not sure. I
> definitely want the commits delayed for the test suite, to keep
> things running fast.
>
> -Damien
>
>
Re: Faster updates, optional ACID
Posted by Jan Lehnardt <ja...@apache.org>.
On 5 Jan 2009, at 18:54, Damien Katz wrote:
> The last remaining work item is db server crash detection, so that
> clients can detect when a server has crashed and potentially lost
> updates. This is pretty simple, each db server just needs a unique
> ID generated at it's startup. Client retrieve this value at the
> beginning of the writes and then checks that the value is the same
> once down a flushed to disk. If not, we know we maybe have lost some
> updates and we redo the replication from the last known good commit.
We should document that pattern for clients that use CouchDB
and put in data without replication. Their writes might also not
go in.
> Right now the default is to delay the commits, because I think that
> will be the most common use case but I'm really not sure. I
> definitely want the commits delayed for the test suite, to keep
> things running fast.
+1
Cheers
Jan
--
Re: Faster updates, optional ACID
Posted by "Geir Magnusson Jr." <ge...@pobox.com>.
On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:
> It was brought to my attention that commits on OS X were very slow
> with the latest releases of Erlang. After I upgraded to the most
> recent version, I found them to be indeed slow, slowing the tests
> down to point it was painful to run them. It appears, any disk sync
> from Erlang now takes somewhere between 50 and 100ms, up from the
> previous times of ~5ms. This is almost certainly due to the
> F_FULLFSYNC flag the erlang file handling now uses on darwin based
> systems, but it's surprising how bad performance on OS X. A little
> investigation had shown other database engines have similar issues
> on OS X.
One user, though, reported a huge performance difference between 0.8
and trunk *using the same version of Erlang*.
http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
To me, this hints that something changed in CouchDB that triggers
usage of fcntl(F_FULLFSYNC).
I haven't tried to duplicate but will. If this can be verified, I
think that this is an important mystery to solve.
>
>
> To address this problem, I implemented delayed commit functionality.
> We had always intended to implement delayed commit for performance
> reasons, but hadn't had the need until now. This makes updates much
> faster in the general case, but with the caveat they aren't flushed
> completely to disk right way. If you can't tolerate the possible
> loss of recent updates, you can use the "full commit" option for
> ACID commits.
So this bring up a question. There is no way to get the same
durability semantics on linux as you can get on OS X with F_FULLSYNC.
This means that the "full commit" option really gives you different
levels of durability, depending on whether or not you are on OS X.
And thinking more about what appears to be the perf bug/slowdown in
CouchDB code, might his warrant three options?
1) delayed commit (what you did last night)
2) fsync() commit (what I suspect Couch did on and around 0.8)
3) optional F_FULLSYNC commit, on OS X and any other platform that
provides this level of commit
>
>
> For full acid commit, add a header field to the doc PUT or
> _bulk_docs POST like this:
> X-Couch-Full-Commit:true
>
> Then couchdb will completely commit the change before returning.
That's cool but it puts the burden on the client - why not make it a
config option, so that the db admin can choose the durability level in
general, and let clients that know they are talking to couch override
w/ a header?
geir
>
>
> Also, if you have several delayed updates and you want to make sure
> they all made it to disk, you can invoke POST /db/
> _ensure_full_commit and all outstanding commits are flushed to disk.
>
> The view engine has been already modified to deal with delayed
> commits too, it ensures it never fully commits it's own indexes to
> disk if the documents indexed aren't already committed to disk.
>
> The last remaining work item is db server crash detection, so that
> clients can detect when a server has crashed and potentially lost
> updates. This is pretty simple, each db server just needs a unique
> ID generated at it's startup. Client retrieve this value at the
> beginning of the writes and then checks that the value is the same
> once down a flushed to disk. If not, we know we maybe have lost some
> updates and we redo the replication from the last known good commit.
>
> Right now the default is to delay the commits, because I think that
> will be the most common use case but I'm really not sure. I
> definitely want the commits delayed for the test suite, to keep
> things running fast.
>
> -Damien
>
>