You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Damien Katz <da...@apache.org> on 2009/01/05 18:54:58 UTC

Faster updates, optional ACID

It was brought to my attention that commits on OS X were very slow  
with the latest releases of Erlang. After I upgraded to the most  
recent version, I found them to be indeed slow, slowing the tests down  
to point it was painful to run them. It appears, any disk sync from  
Erlang now takes somewhere between 50 and 100ms, up from the previous  
times of ~5ms. This is almost certainly due to the F_FULLFSYNC flag  
the erlang file handling now uses on darwin based systems, but it's  
surprising how bad performance on OS X. A little investigation had  
shown other database engines have similar issues on OS X.

To address this problem, I implemented delayed commit functionality.  
We had always intended to implement delayed commit for performance  
reasons, but hadn't had the need until now. This makes updates much  
faster in the general case, but with the caveat they aren't flushed  
completely to disk right way. If you can't tolerate the possible loss  
of recent updates, you can use the "full commit" option for ACID  
commits.

For full acid commit, add a header field to the doc PUT or _bulk_docs  
POST like this:
  X-Couch-Full-Commit:true

Then couchdb will completely commit the change before returning.

Also, if you have several delayed updates and you want to make sure  
they all made it to disk, you can invoke POST /db/_ensure_full_commit  
and all outstanding commits are flushed to disk.

The view engine has been already modified to deal with delayed commits  
too, it ensures it never fully commits it's own indexes to disk if the  
documents indexed aren't already committed to disk.

The last remaining work item is db server crash detection, so that  
clients can detect when a server has crashed and potentially lost  
updates. This is pretty simple, each db server just needs a unique ID  
generated at it's startup. Client retrieve this value at the beginning  
of the writes and then checks that the value is the same once down a  
flushed to disk. If not, we know we maybe have lost some updates and  
we redo the replication from the last known good commit.

Right now the default is to delay the commits, because I think that  
will be the most common use case but I'm really not sure. I definitely  
want the commits delayed for the test suite, to keep things running  
fast.

-Damien

Re: Faster updates, optional ACID

Posted by Jan Lehnardt <ja...@apache.org>.

On 6 Jan 2009, at 12:56, Noah Slater wrote:

> On Tue, Jan 06, 2009 at 12:52:26PM +0100, Jan Lehnardt wrote:
>>
>> On 6 Jan 2009, at 12:43, Noah Slater wrote:
>>
>>> On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
>>>> For full acid commit, add a header field to the doc PUT or  
>>>> _bulk_docs
>>>> POST like this:
>>>> X-Couch-Full-Commit:true
>>>
>>> What is the cost/benefit of doing this via HTTP headers vs. query
>>> string?
>>
>> The HTTP headers way is implemented now. Damien said in another  
>> mail, that
>> different options for this (per request, and db- and server-wide  
>> config)
>> are on lower priority right now, but they will definitely come:
>
> Not sure I follow. This feature is already implemented? My question  
> was about
> the per-request way of setting this option. Should it be done using  
> HTTP headers
> or should it be done using a query string option?

Heh, I mean: The advantage is that the header version doesn't have to be
implemented, it exists. The query-parameter version doesn't. This is  
not about
the usefulness of the query-param.

I'd say the header is "more correct" because we are describing request  
semantics,
not content-options. But I see arguments for the query-parameter  
version as well.

Cheers
Jan
--

Re: Faster updates, optional ACID

Posted by Noah Slater <ns...@apache.org>.

On Tue, Jan 06, 2009 at 12:52:26PM +0100, Jan Lehnardt wrote:
>
> On 6 Jan 2009, at 12:43, Noah Slater wrote:
>
>> On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
>>> For full acid commit, add a header field to the doc PUT or _bulk_docs
>>> POST like this:
>>> X-Couch-Full-Commit:true
>>
>> What is the cost/benefit of doing this via HTTP headers vs. query
>> string?
>
> The HTTP headers way is implemented now. Damien said in another mail, that
> different options for this (per request, and db- and server-wide config)
> are on lower priority right now, but they will definitely come:

Not sure I follow. This feature is already implemented? My question was about
the per-request way of setting this option. Should it be done using HTTP headers
or should it be done using a query string option?

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Faster updates, optional ACID

Posted by Jan Lehnardt <ja...@apache.org>.

On 6 Jan 2009, at 12:43, Noah Slater wrote:

> On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
>> For full acid commit, add a header field to the doc PUT or _bulk_docs
>> POST like this:
>> X-Couch-Full-Commit:true
>
> What is the cost/benefit of doing this via HTTP headers vs. query  
> string?

The HTTP headers way is implemented now. Damien said in another mail,
that different options for this (per request, and db- and server-wide  
config)
are on lower priority right now, but they will definitely come:

"Definitely, I think commit options should be settable per-database.  
But for
now I was just wanting to address the slowdown, especially for  
replication
and the tests, to keep everyone productive. More commit features and
options is lower priority work for now, I was just addresses the most  
serious
slowdown."
   -- Damien in <D8...@apache.org>

Cheers
Jan
--

Re: Faster updates, optional ACID

Posted by Noah Slater <ns...@apache.org>.

On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
> For full acid commit, add a header field to the doc PUT or _bulk_docs
> POST like this:
>  X-Couch-Full-Commit:true

What is the cost/benefit of doing this via HTTP headers vs. query string?

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Faster updates, optional ACID

Posted by Jan Lehnardt <ja...@apache.org>.

On 5 Jan 2009, at 18:54, Damien Katz wrote:
> The last remaining work item is db server crash detection, so that  
> clients can detect when a server has crashed and potentially lost  
> updates. This is pretty simple, each db server just needs a unique  
> ID generated at it's startup. Client retrieve this value at the  
> beginning of the writes and then checks that the value is the same  
> once down a flushed to disk. If not, we know we maybe have lost some  
> updates and we redo the replication from the last known good commit.

We should document that pattern for clients that use CouchDB
and put in data without replication. Their writes might also not
go in.


> Right now the default is to delay the commits, because I think that  
> will be the most common use case but I'm really not sure. I  
> definitely want the commits delayed for the test suite, to keep  
> things running fast.

+1

Cheers
Jan
--

Re: Faster updates, optional ACID

Posted by Noah Slater <ns...@apache.org>.

On Mon, Jan 05, 2009 at 12:54:58PM -0500, Damien Katz wrote:
> For full acid commit, add a header field to the doc PUT or _bulk_docs
> POST like this:
>  X-Couch-Full-Commit:true

What is the cost/benefit of doing this via HTTP headers vs. query string?

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Faster updates, optional ACID

Posted by Noah Slater <ns...@apache.org>.

On Mon, Jan 05, 2009 at 01:59:45PM -0500, Geir Magnusson Jr. wrote:
> That's cool but it puts the burden on the client - why not make it a
> config option, so that the db admin can choose the durability level in
> general, and let clients that know they are talking to couch override w/
> a header?

+1

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Faster updates, optional ACID

Posted by Noah Slater <ns...@apache.org>.

On Mon, Jan 05, 2009 at 01:59:45PM -0500, Geir Magnusson Jr. wrote:
> That's cool but it puts the burden on the client - why not make it a
> config option, so that the db admin can choose the durability level in
> general, and let clients that know they are talking to couch override w/
> a header?

+1

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Faster updates, optional ACID

Posted by Damien Katz <da...@apache.org>.

On Jan 5, 2009, at 2:38 PM, Chris Anderson wrote:

> On Mon, Jan 5, 2009 at 11:32 AM, Damien Katz <da...@apache.org>  
> wrote:
>>>
>>> 1) delayed commit (what you did last night)
>>> 2) fsync() commit (what I suspect Couch did on and around 0.8)
>>> 3) optional F_FULLSYNC commit, on OS X and any other platform that
>>> provides this level of commit
>>>
>>
>> If necessary and possible, we'll patch the Erlang VM. But if a  
>> platform
>> doesn't support proper flushing, then it's not a platform that can  
>> support
>> an ACID database.
>>
>
> If I follow correctly, you're saying that on OS X, 2 == 1, so we use 3
> when we want to be ACID. On Linux, as far as we know, 2 will support
> ACID. There's only really two options, ACID commit, and delayed
> commit. It's up to CouchDB / Erlang / the OS to make sure that the
> ACID option isn't false advertising. Makes sense.

If fsync doesn't work on a platform (like previously on OS X), then  
neither the delayed commit or fully flushed commit can work.

The way the delayed commit works, the database on disk is never in an  
inconsistent state. The new data may be partially written and the  
header in memory points to it, but the header on disk still points to  
the old consistent data. So this new option isn't like OS X before the  
F_FULLFSYNC patch, because that could actually cause corruptions  
(header points to incomplete structures) and you'd lose previously  
committed data. This new patch might lose the last uncommitted writes  
if it crashes before the final flush, but the previously committed  
data is always consistent and instantly available, no fixup needed.

-Damien

>
>
>
>> Definitely, I think commit options should be settable per-database.  
>> But for
>> now I was just wanting to address the slowdown, especially for  
>> replication
>> and the tests, to keep everyone productive. More commit features  
>> and options
>> is lower priority work for now, I was just addresses the most serious
>> slowdown.
>
> The config system is so flexible and easy to hook into, that I'm not
> worried about adding these settings. Should be a breeze once the
> software itself is settled.
>
> +1 to the whole thing.
>
> -- 
> Chris Anderson
> http://jchris.mfdz.com

Re: Faster updates, optional ACID

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

On Jan 5, 2009, at 2:38 PM, Chris Anderson wrote:

> On Mon, Jan 5, 2009 at 11:32 AM, Damien Katz <da...@apache.org>  
> wrote:
>>>
>>> 1) delayed commit (what you did last night)
>>> 2) fsync() commit (what I suspect Couch did on and around 0.8)
>>> 3) optional F_FULLSYNC commit, on OS X and any other platform that
>>> provides this level of commit
>>>
>>
>> If necessary and possible, we'll patch the Erlang VM. But if a  
>> platform
>> doesn't support proper flushing, then it's not a platform that can  
>> support
>> an ACID database.
>>
>
> If I follow correctly, you're saying that on OS X, 2 == 1, so we use 3
> when we want to be ACID. On Linux, as far as we know, 2 will support
> ACID. There's only really two options, ACID commit, and delayed
> commit. It's up to CouchDB / Erlang / the OS to make sure that the
> ACID option isn't false advertising. Makes sense.

[Disclaimer : I'm not an OS filesystem expert]

Except AFAIK that's not what is happening.   fsync() on OS X or linux  
doesn't support the 'deterministic commit to physical media', which is  
what I think you think you are getting.

As far as I can tell, only OS X gives you that full durability via  
fcntl(F_FULLSYNC), and it's horribly slow.

That's why I'm proposing three distinct modes (at least).

geir

Re: Faster updates, optional ACID

Posted by Chris Anderson <jc...@gmail.com>.

On Mon, Jan 5, 2009 at 11:32 AM, Damien Katz <da...@apache.org> wrote:
>>
>> 1) delayed commit (what you did last night)
>> 2) fsync() commit (what I suspect Couch did on and around 0.8)
>> 3) optional F_FULLSYNC commit, on OS X and any other platform that
>> provides this level of commit
>>
>
> If necessary and possible, we'll patch the Erlang VM. But if a platform
> doesn't support proper flushing, then it's not a platform that can support
> an ACID database.
>

If I follow correctly, you're saying that on OS X, 2 == 1, so we use 3
when we want to be ACID. On Linux, as far as we know, 2 will support
ACID. There's only really two options, ACID commit, and delayed
commit. It's up to CouchDB / Erlang / the OS to make sure that the
ACID option isn't false advertising. Makes sense.

> Definitely, I think commit options should be settable per-database. But for
> now I was just wanting to address the slowdown, especially for replication
> and the tests, to keep everyone productive. More commit features and options
> is lower priority work for now, I was just addresses the most serious
> slowdown.

The config system is so flexible and easy to hook into, that I'm not
worried about adding these settings. Should be a breeze once the
software itself is settled.

+1 to the whole thing.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Faster updates, optional ACID

Posted by Jan Lehnardt <ja...@apache.org>.

On 5 Jan 2009, at 20:51, Geir Magnusson Jr. wrote:
>
> fsync() on Linux and OS X flushes to disk.  I'm not suggesting that  
> it doesn't.
>
> What it doesn't do is flush the write caches *on* the disk unit  
> itself (which is what F_FULLSYNC supposedly does, hence the sloth)
>
> IOW, with fsync(), there's no guarantee that the bits get written to  
> the physical media.  As far as the OS knows, the FS caches are  
> flushed to the device, but the device may still be holding in it's  
> own RAM.

We haven't gotten any data-loss reports on Linux systems
using fsync(). We did for Mac OS X prior to the F_FULLFSYNC
patch in Erlang.

> re the FULLFSYNC change, do you have the option to not have it used,  
> but have fsync() used instead?

Nope, file:fsync() (Erlang) calls fsync() (C) on Linux and
fcntl(F_FULLFSYNC) (C) on Darwin-based OSes, this is
hardcoded.

>> If necessary and possible, we'll patch the Erlang VM.
>
> That seems like a bad idea to me - I'd think you'd want to stay out  
> of the VM business.

We are pushing the Erlang VM in various ways. So far,
feedback was very welcome by the VM developers and
we were encouraged to keep this up. Note, that the code
of the VM is tightly controlled and patches are evaluated
thoroughly by Ericsson.

Cheers
Jan
--

Re: Faster updates, optional ACID

Posted by Randall Leeds <ra...@gmail.com>.

IIRC I saw something about OS X and this flush issue before. It came up when
I was talking to an FS person at Apple.

I believe that the explanation was that hardware manufacturers often lie
about the flush call to gain better benchmark scores. The way apple ensures
that the full sync is actually completely is to stuff the buffer of the disk
with nonsense after the write to ensure that the real data has been pushed
out.

Something similar could without a doubt be done on Linux, but I don't know
that the OS handles it in any exposed way yet. I'm not familiar enough with
the appropriate linux syscalls, but perhaps you could patch the erlang VM to
do something similar to OS X. Query and discover the size of the write
buffer on the hardware and write that much garbage in order to flush.

IMO this is an ugly software hack to patch a hardware problem. If you
_absolutely need_ the full flushing you should figure out what hard drive
manufacturers don't produce this sort of flawed fsync behavior or get
something like a battery backed raid. Unfortunately, we're stuck in a hard
place as a result of silly, competitive benchmark races by hard disk
manufacturers.

-Randall

On Mon, Jan 5, 2009 at 15:04, Damien Katz <da...@apache.org> wrote:

>
> On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:
>
>
>> On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
>>
>>
>>> If necessary and possible, we'll patch the Erlang VM.
>>>
>>
>> That seems like a bad idea to me - I'd think you'd want to stay out of the
>> VM business.
>>
>
> No, I mean send patches to the maintainers of Erlang to fix any problems on
> their supported platforms.  Just like the F_FULLFSYNC patch.
>
>
>>
>>  But if a platform doesn't support proper flushing, then it's not a
>>> platform that can support an ACID database.
>>>
>>
>> We're not communicating well here.
>>
>> "proper flushing" depends on what you want to do - if you need your data
>> to in confirmed permanent storage so that it can survive a crash or power
>> cut, then w/o special configuration (e.g. battery-backed RAID, for example),
>> I don't think that you're going to get assurance on linux.
>>
>> Do you see what I'm saying?
>>
>>
> Yes I see what you are saying. Can you show that Linux doesn't actually
> safely push the bits to disk in popular distros? If that's the case, then we
> need to find the APIs that actually work and call them, and if they don't
> work, we don't support Linux.
>
>
>>>  why not make it a config option, so that the db admin can choose the
>>>> durability level in general, and let clients that know they are talking to
>>>> couch override w/ a header?
>>>>
>>>>
>>> Definitely, I think commit options should be settable per-database. But
>>> for now I was just wanting to address the slowdown, especially for
>>> replication and the tests, to keep everyone productive. More commit features
>>> and options is lower priority work for now, I was just addresses the most
>>> serious slowdown.
>>>
>>
>> That makes sense, but IMO you papered over the root problem.
>> It's good to keep people working, but I think the issue deserves a look.
>>  I don't know erlang, or I would look myself.
>>
>
> What issue? Why do you think this is Erlang specific?
>
>
>
>> geir
>>
>
>

Re: Faster updates, optional ACID

Posted by Lawrence Pit <la...@gmail.com>.


Chris Anderson asked me this earlier on the user list:

 > Quick question: was this version of CouchDB built freshly against the
 > latest Erlang? We've noticed that some performance changes on an
 > Erlang upgrade don't appear until after Couch is rebuilt.

Given your results below I started thinking my memory wasn't serving me 
right probably... usually I build from source, and I'm pretty sure I did 
with couchDB, but now also remember I /tried/ installing it from darwin 
ports or something similar, pre-compiled stuff, which, to my best 
recollection, didn't work.

I don't know erlang that well..... could it be possible that the sources 
I have for 0.8 were compiled with R12B-3, which when run under R12B-5 
give good results, while if those sources are compiled with R12B-5 give 
bad results?



Cheers,
Lawrence

> For completeness sake:
>
> The previous results were against Erlang R12B-5.
>
> Here's R12B-3 which doesn't have the fsync() fix:
>
> CouchDB 0.8.0:
> Requests per second:    184.42 [#/sec] (mean)
>
>
> CouchDB 0.8.1:
> Requests per second:    185.74 [#/sec] (mean)
>
>
> CouchDB trunk r731451 (pre-async-commit-patch):
> Requests per second:    199.30 [#/sec] (mean)
>
> Cheers
> Jan
> -- 
>
> On 6 Jan 2009, at 16:10, Jan Lehnardt wrote:
>
>>
>> On 6 Jan 2009, at 14:56, Lawrence Pit wrote:
>>
>>> Interesting indeed. I was seeing:
>>>
>>> CouchDB/0.8.0-incubating
>>>
>>> I assume that is different from CouchDB 0.8.1 ?
>>
>>
>> CouchDB 0.8.0:
>>
>> Requests per second:    5.29 [#/sec] (mean)
>>
>> Cheers
>> Jan
>> -- 
>>
>>>
>>> Cheers,
>>> Lawrence
>>>
>>>> Interesting.  I wonder that Lawrence was seeing...
>>>>
>>>> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>>>>
>>>>>
>>>>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>>>>
>>>>>> It was reported that w/ the same up-to-date version of erlang, 
>>>>>> they found a big performance difference between 0.8 and current 
>>>>>> trunk.  If that's true, then it seems to me that something 
>>>>>> changed in the filesystem handling in the CouchDB code itself - 
>>>>>> it could be that there are multiple flush modes, and the 0.8 code 
>>>>>> used whatever corresponds to fsync(), and trunk uses whatever 
>>>>>> corresponds to fnctl(F_FULLSYNC).  I don't know  It's a guess.  
>>>>>> But yesterdays results are unexplained, and I hate mysteries.
>>>>>
>>>>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json'  
>>>>> http://127.0.0.1:5984/test_suite_db
>>>>>
>>>>> CouchDB 0.8.1:
>>>>> Requests per second:    6.56 [#/sec] (mean)
>>>>>
>>>>> CouchDB trunk r731451 (pre-async-commit-patch):
>>>>> Requests per second:    5.94 [#/sec] (mean)
>>>>>
>>>>>
>>>>> Cheers
>>>>> Jan
>>>>> -- 
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Faster updates, optional ACID

Posted by Jan Lehnardt <ja...@apache.org>.

For completeness sake:

The previous results were against Erlang R12B-5.

Here's R12B-3 which doesn't have the fsync() fix:

CouchDB 0.8.0:
Requests per second:    184.42 [#/sec] (mean)


CouchDB 0.8.1:
Requests per second:    185.74 [#/sec] (mean)


CouchDB trunk r731451 (pre-async-commit-patch):
Requests per second:    199.30 [#/sec] (mean)

Cheers
Jan
--

On 6 Jan 2009, at 16:10, Jan Lehnardt wrote:

>
> On 6 Jan 2009, at 14:56, Lawrence Pit wrote:
>
>> Interesting indeed. I was seeing:
>>
>> CouchDB/0.8.0-incubating
>>
>> I assume that is different from CouchDB 0.8.1 ?
>
>
> CouchDB 0.8.0:
>
> Requests per second:    5.29 [#/sec] (mean)
>
> Cheers
> Jan
> --
>
>>
>> Cheers,
>> Lawrence
>>
>>> Interesting.  I wonder that Lawrence was seeing...
>>>
>>> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>>>
>>>>
>>>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>>>
>>>>> It was reported that w/ the same up-to-date version of erlang,  
>>>>> they found a big performance difference between 0.8 and current  
>>>>> trunk.  If that's true, then it seems to me that something  
>>>>> changed in the filesystem handling in the CouchDB code itself -  
>>>>> it could be that there are multiple flush modes, and the 0.8  
>>>>> code used whatever corresponds to fsync(), and trunk uses  
>>>>> whatever corresponds to fnctl(F_FULLSYNC).  I don't know  It's a  
>>>>> guess.  But yesterdays results are unexplained, and I hate  
>>>>> mysteries.
>>>>
>>>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json'  http://127.0.0.1:5984/test_suite_db
>>>>
>>>> CouchDB 0.8.1:
>>>> Requests per second:    6.56 [#/sec] (mean)
>>>>
>>>> CouchDB trunk r731451 (pre-async-commit-patch):
>>>> Requests per second:    5.94 [#/sec] (mean)
>>>>
>>>>
>>>> Cheers
>>>> Jan
>>>> -- 
>>>
>>>
>>
>>
>
>

Re: Faster updates, optional ACID

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

the mystery gets better...

On Jan 6, 2009, at 10:10 AM, Jan Lehnardt wrote:

>
> On 6 Jan 2009, at 14:56, Lawrence Pit wrote:
>
>> Interesting indeed. I was seeing:
>>
>> CouchDB/0.8.0-incubating
>>
>> I assume that is different from CouchDB 0.8.1 ?
>
>
> CouchDB 0.8.0:
>
> Requests per second:    5.29 [#/sec] (mean)
>
> Cheers
> Jan
> --
>
>>
>> Cheers,
>> Lawrence
>>
>>> Interesting.  I wonder that Lawrence was seeing...
>>>
>>> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>>>
>>>>
>>>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>>>
>>>>> It was reported that w/ the same up-to-date version of erlang,  
>>>>> they found a big performance difference between 0.8 and current  
>>>>> trunk.  If that's true, then it seems to me that something  
>>>>> changed in the filesystem handling in the CouchDB code itself -  
>>>>> it could be that there are multiple flush modes, and the 0.8  
>>>>> code used whatever corresponds to fsync(), and trunk uses  
>>>>> whatever corresponds to fnctl(F_FULLSYNC).  I don't know  It's a  
>>>>> guess.  But yesterdays results are unexplained, and I hate  
>>>>> mysteries.
>>>>
>>>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json'  http://127.0.0.1:5984/test_suite_db
>>>>
>>>> CouchDB 0.8.1:
>>>> Requests per second:    6.56 [#/sec] (mean)
>>>>
>>>> CouchDB trunk r731451 (pre-async-commit-patch):
>>>> Requests per second:    5.94 [#/sec] (mean)
>>>>
>>>>
>>>> Cheers
>>>> Jan
>>>> -- 
>>>
>>>
>>
>>
>

Re: Faster updates, optional ACID

Posted by Jan Lehnardt <ja...@apache.org>.

On 6 Jan 2009, at 14:56, Lawrence Pit wrote:

> Interesting indeed. I was seeing:
>
> CouchDB/0.8.0-incubating
>
> I assume that is different from CouchDB 0.8.1 ?


CouchDB 0.8.0:

Requests per second:    5.29 [#/sec] (mean)

Cheers
Jan
--

>
> Cheers,
> Lawrence
>
>> Interesting.  I wonder that Lawrence was seeing...
>>
>> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>>
>>>
>>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>>
>>>> It was reported that w/ the same up-to-date version of erlang,  
>>>> they found a big performance difference between 0.8 and current  
>>>> trunk.  If that's true, then it seems to me that something  
>>>> changed in the filesystem handling in the CouchDB code itself -  
>>>> it could be that there are multiple flush modes, and the 0.8 code  
>>>> used whatever corresponds to fsync(), and trunk uses whatever  
>>>> corresponds to fnctl(F_FULLSYNC).  I don't know  It's a guess.   
>>>> But yesterdays results are unexplained, and I hate mysteries.
>>>
>>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json'  http://127.0.0.1:5984/test_suite_db
>>>
>>> CouchDB 0.8.1:
>>> Requests per second:    6.56 [#/sec] (mean)
>>>
>>> CouchDB trunk r731451 (pre-async-commit-patch):
>>> Requests per second:    5.94 [#/sec] (mean)
>>>
>>>
>>> Cheers
>>> Jan
>>> -- 
>>
>>
>
>

Re: Faster updates, optional ACID

Posted by Lawrence Pit <la...@gmail.com>.



Interesting indeed. I was seeing:

CouchDB/0.8.0-incubating

I assume that is different from CouchDB 0.8.1 ?


Cheers,
Lawrence

> Interesting.  I wonder that Lawrence was seeing...
>
> On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:
>
>>
>> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>>
>>> It was reported that w/ the same up-to-date version of erlang, they 
>>> found a big performance difference between 0.8 and current trunk.  
>>> If that's true, then it seems to me that something changed in the 
>>> filesystem handling in the CouchDB code itself - it could be that 
>>> there are multiple flush modes, and the 0.8 code used whatever 
>>> corresponds to fsync(), and trunk uses whatever corresponds to 
>>> fnctl(F_FULLSYNC).  I don't know  It's a guess.  But yesterdays 
>>> results are unexplained, and I hate mysteries.
>>
>> $ ab -c 10 -n 1000 -p emptypost -T 'application/json'  
>> http://127.0.0.1:5984/test_suite_db
>>
>> CouchDB 0.8.1:
>> Requests per second:    6.56 [#/sec] (mean)
>>
>> CouchDB trunk r731451 (pre-async-commit-patch):
>> Requests per second:    5.94 [#/sec] (mean)
>>
>>
>> Cheers
>> Jan
>> -- 
>
>

Re: Faster updates, optional ACID

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

Interesting.  I wonder that Lawrence was seeing...

On Jan 6, 2009, at 7:11 AM, Jan Lehnardt wrote:

>
> On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>>
>> It was reported that w/ the same up-to-date version of erlang, they  
>> found a big performance difference between 0.8 and current trunk.   
>> If that's true, then it seems to me that something changed in the  
>> filesystem handling in the CouchDB code itself - it could be that  
>> there are multiple flush modes, and the 0.8 code used whatever  
>> corresponds to fsync(), and trunk uses whatever corresponds to  
>> fnctl(F_FULLSYNC).  I don't know  It's a guess.  But yesterdays  
>> results are unexplained, and I hate mysteries.
>
> $ ab -c 10 -n 1000 -p emptypost -T 'application/json'  http://127.0.0.1:5984/test_suite_db
>
> CouchDB 0.8.1:
> Requests per second:    6.56 [#/sec] (mean)
>
> CouchDB trunk r731451 (pre-async-commit-patch):
> Requests per second:    5.94 [#/sec] (mean)
>
>
> Cheers
> Jan
> --

Re: Faster updates, optional ACID

Posted by Jan Lehnardt <ja...@apache.org>.

On 6 Jan 2009, at 00:46, Geir Magnusson Jr. wrote:
>
> It was reported that w/ the same up-to-date version of erlang, they  
> found a big performance difference between 0.8 and current trunk.   
> If that's true, then it seems to me that something changed in the  
> filesystem handling in the CouchDB code itself - it could be that  
> there are multiple flush modes, and the 0.8 code used whatever  
> corresponds to fsync(), and trunk uses whatever corresponds to  
> fnctl(F_FULLSYNC).  I don't know  It's a guess.  But yesterdays  
> results are unexplained, and I hate mysteries.

$ ab -c 10 -n 1000 -p emptypost -T 'application/json'  http://127.0.0.1:5984/test_suite_db

CouchDB 0.8.1:
Requests per second:    6.56 [#/sec] (mean)

CouchDB trunk r731451 (pre-async-commit-patch):
Requests per second:    5.94 [#/sec] (mean)


Cheers
Jan
--

Re: Faster updates, optional ACID

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

On Jan 5, 2009, at 3:04 PM, Damien Katz wrote:

>
> On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:
>
>>
>> On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
>>
>>>
>>> If necessary and possible, we'll patch the Erlang VM.
>>
>> That seems like a bad idea to me - I'd think you'd want to stay out  
>> of the VM business.
>
> No, I mean send patches to the maintainers of Erlang to fix any  
> problems on their supported platforms.  Just like the F_FULLFSYNC  
> patch.

Ah.  Whew :)

>>
>>
>>> But if a platform doesn't support proper flushing, then it's not a  
>>> platform that can support an ACID database.
>>
>> We're not communicating well here.
>>
>> "proper flushing" depends on what you want to do - if you need your  
>> data to in confirmed permanent storage so that it can survive a  
>> crash or power cut, then w/o special configuration (e.g. battery- 
>> backed RAID, for example), I don't think that you're going to get  
>> assurance on linux.
>>
>> Do you see what I'm saying?
>>
>
> Yes I see what you are saying. Can you show that Linux doesn't  
> actually safely push the bits to disk in popular distros? If that's  
> the case, then we need to find the APIs that actually work and call  
> them, and if they don't work, we don't support Linux.

It pushes the bits to the disk drive, but that's where it's sphere of  
effect ends - what the drive does after that is drive specific.   
Drives cache writes to aggregate, or write things out of order based  
on head location, etc.

This isn't something that only affects Couch.

So I would say that .... it's time to relax.

Take the approach that you have a few modes

  a) fsync() mode - for people that care about true durability, it's  
up to them to get or configure drives to behave right, or whatever
  b) the delayed write mode so that you can do things like aggregate  
writes into clocked fsyncs or something  (I'd use this - I'll take the  
performance trade for durability)
  c) and for platforms that offer special modes that really do  
guarantee the write all the way to the physical media, like OS X's  
fcntl(F_FULLSYNC), make that an option too.

>>>
>>>> why not make it a config option, so that the db admin can choose  
>>>> the durability level in general, and let clients that know they  
>>>> are talking to couch override w/ a header?
>>>>
>>>
>>> Definitely, I think commit options should be settable per- 
>>> database. But for now I was just wanting to address the slowdown,  
>>> especially for replication and the tests, to keep everyone  
>>> productive. More commit features and options is lower priority  
>>> work for now, I was just addresses the most serious slowdown.
>>
>> That makes sense, but IMO you papered over the root problem.
>> It's good to keep people working, but I think the issue deserves a  
>> look.  I don't know erlang, or I would look myself.
>
> What issue? Why do you think this is Erlang specific?

Oh - this is a SWAG based on one data point  :)  [it was a rough day -  
I didn't get to try to duplicate the results found yesterday...]

http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E

It was reported that w/ the same up-to-date version of erlang, they  
found a big performance difference between 0.8 and current trunk.  If  
that's true, then it seems to me that something changed in the  
filesystem handling in the CouchDB code itself - it could be that  
there are multiple flush modes, and the 0.8 code used whatever  
corresponds to fsync(), and trunk uses whatever corresponds to  
fnctl(F_FULLSYNC).  I don't know  It's a guess.  But yesterdays  
results are unexplained, and I hate mysteries.

I can't help with the erlang (I don't know it...), but I can at least  
try to reproduce the results...

geir

Re: Faster updates, optional ACID

Posted by Damien Katz <da...@apache.org>.

On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:

>
> On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
>
>>
>> If necessary and possible, we'll patch the Erlang VM.
>
> That seems like a bad idea to me - I'd think you'd want to stay out  
> of the VM business.

No, I mean send patches to the maintainers of Erlang to fix any  
problems on their supported platforms.  Just like the F_FULLFSYNC patch.

>
>
>> But if a platform doesn't support proper flushing, then it's not a  
>> platform that can support an ACID database.
>
> We're not communicating well here.
>
> "proper flushing" depends on what you want to do - if you need your  
> data to in confirmed permanent storage so that it can survive a  
> crash or power cut, then w/o special configuration (e.g. battery- 
> backed RAID, for example), I don't think that you're going to get  
> assurance on linux.
>
> Do you see what I'm saying?
>

Yes I see what you are saying. Can you show that Linux doesn't  
actually safely push the bits to disk in popular distros? If that's  
the case, then we need to find the APIs that actually work and call  
them, and if they don't work, we don't support Linux.

>>
>>> why not make it a config option, so that the db admin can choose  
>>> the durability level in general, and let clients that know they  
>>> are talking to couch override w/ a header?
>>>
>>
>> Definitely, I think commit options should be settable per-database.  
>> But for now I was just wanting to address the slowdown, especially  
>> for replication and the tests, to keep everyone productive. More  
>> commit features and options is lower priority work for now, I was  
>> just addresses the most serious slowdown.
>
> That makes sense, but IMO you papered over the root problem.
> It's good to keep people working, but I think the issue deserves a  
> look.  I don't know erlang, or I would look myself.

What issue? Why do you think this is Erlang specific?


>
> geir

Re: Faster updates, optional ACID

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:

>
> On Jan 5, 2009, at 1:59 PM, Geir Magnusson Jr. wrote:
>
>>
>> On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:
>>
>>> It was brought to my attention that commits on OS X were very slow  
>>> with the latest releases of Erlang. After I upgraded to the most  
>>> recent version, I found them to be indeed slow, slowing the tests  
>>> down to point it was painful to run them. It appears, any disk  
>>> sync from Erlang now takes somewhere between 50 and 100ms, up from  
>>> the previous times of ~5ms. This is almost certainly due to the  
>>> F_FULLFSYNC flag the erlang file handling now uses on darwin based  
>>> systems, but it's surprising how bad performance on OS X. A little  
>>> investigation had shown other database engines have similar issues  
>>> on OS X.
>>
>> One user, though,  reported a huge performance difference between  
>> 0.8 and trunk *using the same version of Erlang*.
>>
>> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
>>
>> To me, this hints that something changed in CouchDB that triggers  
>> usage of fcntl(F_FULLFSYNC).
>>
>> I haven't tried to duplicate but will.  If this can be verified, I  
>> think that this is an important mystery to solve.
>>
>>>
>>>
>>> To address this problem, I implemented delayed commit  
>>> functionality. We had always intended to implement delayed commit  
>>> for performance reasons, but hadn't had the need until now. This  
>>> makes updates much faster in the general case, but with the caveat  
>>> they aren't flushed completely to disk right way. If you can't  
>>> tolerate the possible loss of recent updates, you can use the  
>>> "full commit" option for ACID commits.
>>
>> So this bring up a question.  There is no way to get the same  
>> durability semantics on linux as you can get on OS X with F_FULLSYNC.
>
> On linux, as always it depends (on distro, file system, etc), but  
> generally fsync flushes to disk, or so I've been told by those who  
> should know and I've not seen credible evidence otherwise. But if  
> fsync is broken by default on Linux (like say Debian based distros),  
> file a bug and we'll see about get Erlang patched with the proper  
> apis (the Erlang F_FULLFSYNC change was from us too).

fsync() on Linux and OS X flushes to disk.  I'm not suggesting that it  
doesn't.

What it doesn't do is flush the write caches *on* the disk unit itself  
(which is what F_FULLSYNC supposedly does, hence the sloth)

IOW, with fsync(), there's no guarantee that the bits get written to  
the physical media.  As far as the OS knows, the FS caches are flushed  
to the device, but the device may still be holding in it's own RAM.

re the FULLFSYNC change, do you have the option to not have it used,  
but have fsync() used instead?

>
>
>> This means that the "full commit" option really gives you different  
>> levels of durability, depending on whether or not you are on OS X.
>>
>> And thinking more about what appears to be the perf bug/slowdown in  
>> CouchDB code, might his warrant three options?
>>
>> 1) delayed commit (what you did last night)
>> 2) fsync() commit (what I suspect Couch did on and around 0.8)
>> 3) optional F_FULLSYNC commit, on OS X and any other platform that  
>> provides this level of commit
>>
>
> If necessary and possible, we'll patch the Erlang VM.

That seems like a bad idea to me - I'd think you'd want to stay out of  
the VM business.

> But if a platform doesn't support proper flushing, then it's not a  
> platform that can support an ACID database.

We're not communicating well here.

"proper flushing" depends on what you want to do - if you need your  
data to in confirmed permanent storage so that it can survive a crash  
or power cut, then w/o special configuration (e.g. battery-backed  
RAID, for example), I don't think that you're going to get assurance  
on linux.

Do you see what I'm saying?


>
>
>>>
>>>
>>> For full acid commit, add a header field to the doc PUT or  
>>> _bulk_docs POST like this:
>>> X-Couch-Full-Commit:true
>>>
>>> Then couchdb will completely commit the change before returning.
>>
>>
>> That's cool but it puts the burden on the client -
>
> True, but they already have the API they must conform too, I don't  
> see this option as being particularly burdensome unless it's simply  
> the wrong default.

But it keeps adding requirements to the API that aren't really in the  
application domain necessarily.

>
>
>> why not make it a config option, so that the db admin can choose  
>> the durability level in general, and let clients that know they are  
>> talking to couch override w/ a header?
>>
>
> Definitely, I think commit options should be settable per-database.  
> But for now I was just wanting to address the slowdown, especially  
> for replication and the tests, to keep everyone productive. More  
> commit features and options is lower priority work for now, I was  
> just addresses the most serious slowdown.

That makes sense, but IMO you papered over the root problem.  It's  
good to keep people working, but I think the issue deserves a look.  I  
don't know erlang, or I would look myself.

geir

>
>
> Also, having the default be delayed commit will help us flush out  
> any problems, especially for usability and real productioning  
> testing. If it's broken, either a simple bug or by design, we need  
> to know it as soon as possible.
>
> -Damien
>
>>
>> geir
>>
>>>
>>>
>>> Also, if you have several delayed updates and you want to make  
>>> sure they all made it to disk, you can invoke POST /db/ 
>>> _ensure_full_commit and all outstanding commits are flushed to disk.
>>>
>>> The view engine has been already modified to deal with delayed  
>>> commits too, it ensures it never fully commits it's own indexes to  
>>> disk if the documents indexed aren't already committed to disk.
>>>
>>> The last remaining work item is db server crash detection, so that  
>>> clients can detect when a server has crashed and potentially lost  
>>> updates. This is pretty simple, each db server just needs a unique  
>>> ID generated at it's startup. Client retrieve this value at the  
>>> beginning of the writes and then checks that the value is the same  
>>> once down a flushed to disk. If not, we know we maybe have lost  
>>> some updates and we redo the replication from the last known good  
>>> commit.
>>>
>>> Right now the default is to delay the commits, because I think  
>>> that will be the most common use case but I'm really not sure. I  
>>> definitely want the commits delayed for the test suite, to keep  
>>> things running fast.
>>>
>>> -Damien
>>>
>>>
>>
>

Re: Faster updates, optional ACID

Posted by Damien Katz <da...@apache.org>.

On Jan 5, 2009, at 1:59 PM, Geir Magnusson Jr. wrote:

>
> On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:
>
>> It was brought to my attention that commits on OS X were very slow  
>> with the latest releases of Erlang. After I upgraded to the most  
>> recent version, I found them to be indeed slow, slowing the tests  
>> down to point it was painful to run them. It appears, any disk sync  
>> from Erlang now takes somewhere between 50 and 100ms, up from the  
>> previous times of ~5ms. This is almost certainly due to the  
>> F_FULLFSYNC flag the erlang file handling now uses on darwin based  
>> systems, but it's surprising how bad performance on OS X. A little  
>> investigation had shown other database engines have similar issues  
>> on OS X.
>
> One user, though,  reported a huge performance difference between  
> 0.8 and trunk *using the same version of Erlang*.
>
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
>
> To me, this hints that something changed in CouchDB that triggers  
> usage of fcntl(F_FULLFSYNC).
>
> I haven't tried to duplicate but will.  If this can be verified, I  
> think that this is an important mystery to solve.
>
>>
>>
>> To address this problem, I implemented delayed commit  
>> functionality. We had always intended to implement delayed commit  
>> for performance reasons, but hadn't had the need until now. This  
>> makes updates much faster in the general case, but with the caveat  
>> they aren't flushed completely to disk right way. If you can't  
>> tolerate the possible loss of recent updates, you can use the "full  
>> commit" option for ACID commits.
>
> So this bring up a question.  There is no way to get the same  
> durability semantics on linux as you can get on OS X with F_FULLSYNC.

On linux, as always it depends (on distro, file system, etc), but  
generally fsync flushes to disk, or so I've been told by those who  
should know and I've not seen credible evidence otherwise. But if  
fsync is broken by default on Linux (like say Debian based distros),  
file a bug and we'll see about get Erlang patched with the proper apis  
(the Erlang F_FULLFSYNC change was from us too).

> This means that the "full commit" option really gives you different  
> levels of durability, depending on whether or not you are on OS X.
>
> And thinking more about what appears to be the perf bug/slowdown in  
> CouchDB code, might his warrant three options?
>
> 1) delayed commit (what you did last night)
> 2) fsync() commit (what I suspect Couch did on and around 0.8)
> 3) optional F_FULLSYNC commit, on OS X and any other platform that  
> provides this level of commit
>

If necessary and possible, we'll patch the Erlang VM. But if a  
platform doesn't support proper flushing, then it's not a platform  
that can support an ACID database.

>>
>>
>> For full acid commit, add a header field to the doc PUT or  
>> _bulk_docs POST like this:
>> X-Couch-Full-Commit:true
>>
>> Then couchdb will completely commit the change before returning.
>
>
> That's cool but it puts the burden on the client -

True, but they already have the API they must conform too, I don't see  
this option as being particularly burdensome unless it's simply the  
wrong default.

> why not make it a config option, so that the db admin can choose the  
> durability level in general, and let clients that know they are  
> talking to couch override w/ a header?
>

Definitely, I think commit options should be settable per-database.  
But for now I was just wanting to address the slowdown, especially for  
replication and the tests, to keep everyone productive. More commit  
features and options is lower priority work for now, I was just  
addresses the most serious slowdown.

Also, having the default be delayed commit will help us flush out any  
problems, especially for usability and real productioning testing. If  
it's broken, either a simple bug or by design, we need to know it as  
soon as possible.

-Damien

>
> geir
>
>>
>>
>> Also, if you have several delayed updates and you want to make sure  
>> they all made it to disk, you can invoke POST /db/ 
>> _ensure_full_commit and all outstanding commits are flushed to disk.
>>
>> The view engine has been already modified to deal with delayed  
>> commits too, it ensures it never fully commits it's own indexes to  
>> disk if the documents indexed aren't already committed to disk.
>>
>> The last remaining work item is db server crash detection, so that  
>> clients can detect when a server has crashed and potentially lost  
>> updates. This is pretty simple, each db server just needs a unique  
>> ID generated at it's startup. Client retrieve this value at the  
>> beginning of the writes and then checks that the value is the same  
>> once down a flushed to disk. If not, we know we maybe have lost  
>> some updates and we redo the replication from the last known good  
>> commit.
>>
>> Right now the default is to delay the commits, because I think that  
>> will be the most common use case but I'm really not sure. I  
>> definitely want the commits delayed for the test suite, to keep  
>> things running fast.
>>
>> -Damien
>>
>>
>

Re: Faster updates, optional ACID

Posted by Damien Katz <da...@apache.org>.

On Jan 5, 2009, at 1:59 PM, Geir Magnusson Jr. wrote:

>
> On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:
>
>> It was brought to my attention that commits on OS X were very slow  
>> with the latest releases of Erlang. After I upgraded to the most  
>> recent version, I found them to be indeed slow, slowing the tests  
>> down to point it was painful to run them. It appears, any disk sync  
>> from Erlang now takes somewhere between 50 and 100ms, up from the  
>> previous times of ~5ms. This is almost certainly due to the  
>> F_FULLFSYNC flag the erlang file handling now uses on darwin based  
>> systems, but it's surprising how bad performance on OS X. A little  
>> investigation had shown other database engines have similar issues  
>> on OS X.
>
> One user, though,  reported a huge performance difference between  
> 0.8 and trunk *using the same version of Erlang*.
>
> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E
>
> To me, this hints that something changed in CouchDB that triggers  
> usage of fcntl(F_FULLFSYNC).
>
> I haven't tried to duplicate but will.  If this can be verified, I  
> think that this is an important mystery to solve.
>
>>
>>
>> To address this problem, I implemented delayed commit  
>> functionality. We had always intended to implement delayed commit  
>> for performance reasons, but hadn't had the need until now. This  
>> makes updates much faster in the general case, but with the caveat  
>> they aren't flushed completely to disk right way. If you can't  
>> tolerate the possible loss of recent updates, you can use the "full  
>> commit" option for ACID commits.
>
> So this bring up a question.  There is no way to get the same  
> durability semantics on linux as you can get on OS X with F_FULLSYNC.

On linux, as always it depends (on distro, file system, etc), but  
generally fsync flushes to disk, or so I've been told by those who  
should know and I've not seen credible evidence otherwise. But if  
fsync is broken by default on Linux (like say Debian based distros),  
file a bug and we'll see about get Erlang patched with the proper apis  
(the Erlang F_FULLFSYNC change was from us too).

> This means that the "full commit" option really gives you different  
> levels of durability, depending on whether or not you are on OS X.
>
> And thinking more about what appears to be the perf bug/slowdown in  
> CouchDB code, might his warrant three options?
>
> 1) delayed commit (what you did last night)
> 2) fsync() commit (what I suspect Couch did on and around 0.8)
> 3) optional F_FULLSYNC commit, on OS X and any other platform that  
> provides this level of commit
>

If necessary and possible, we'll patch the Erlang VM. But if a  
platform doesn't support proper flushing, then it's not a platform  
that can support an ACID database.

>>
>>
>> For full acid commit, add a header field to the doc PUT or  
>> _bulk_docs POST like this:
>> X-Couch-Full-Commit:true
>>
>> Then couchdb will completely commit the change before returning.
>
>
> That's cool but it puts the burden on the client -

True, but they already have the API they must conform too, I don't see  
this option as being particularly burdensome unless it's simply the  
wrong default.

> why not make it a config option, so that the db admin can choose the  
> durability level in general, and let clients that know they are  
> talking to couch override w/ a header?
>

Definitely, I think commit options should be settable per-database.  
But for now I was just wanting to address the slowdown, especially for  
replication and the tests, to keep everyone productive. More commit  
features and options is lower priority work for now, I was just  
addresses the most serious slowdown.

Also, having the default be delayed commit will help us flush out any  
problems, especially for usability and real productioning testing. If  
it's broken, either a simple bug or by design, we need to know it as  
soon as possible.

-Damien

>
> geir
>
>>
>>
>> Also, if you have several delayed updates and you want to make sure  
>> they all made it to disk, you can invoke POST /db/ 
>> _ensure_full_commit and all outstanding commits are flushed to disk.
>>
>> The view engine has been already modified to deal with delayed  
>> commits too, it ensures it never fully commits it's own indexes to  
>> disk if the documents indexed aren't already committed to disk.
>>
>> The last remaining work item is db server crash detection, so that  
>> clients can detect when a server has crashed and potentially lost  
>> updates. This is pretty simple, each db server just needs a unique  
>> ID generated at it's startup. Client retrieve this value at the  
>> beginning of the writes and then checks that the value is the same  
>> once down a flushed to disk. If not, we know we maybe have lost  
>> some updates and we redo the replication from the last known good  
>> commit.
>>
>> Right now the default is to delay the commits, because I think that  
>> will be the most common use case but I'm really not sure. I  
>> definitely want the commits delayed for the test suite, to keep  
>> things running fast.
>>
>> -Damien
>>
>>
>

Re: Faster updates, optional ACID

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:

> It was brought to my attention that commits on OS X were very slow  
> with the latest releases of Erlang. After I upgraded to the most  
> recent version, I found them to be indeed slow, slowing the tests  
> down to point it was painful to run them. It appears, any disk sync  
> from Erlang now takes somewhere between 50 and 100ms, up from the  
> previous times of ~5ms. This is almost certainly due to the  
> F_FULLFSYNC flag the erlang file handling now uses on darwin based  
> systems, but it's surprising how bad performance on OS X. A little  
> investigation had shown other database engines have similar issues  
> on OS X.

One user, though,  reported a huge performance difference between 0.8  
and trunk *using the same version of Erlang*.

http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E

To me, this hints that something changed in CouchDB that triggers  
usage of fcntl(F_FULLFSYNC).

I haven't tried to duplicate but will.  If this can be verified, I  
think that this is an important mystery to solve.

>
>
> To address this problem, I implemented delayed commit functionality.  
> We had always intended to implement delayed commit for performance  
> reasons, but hadn't had the need until now. This makes updates much  
> faster in the general case, but with the caveat they aren't flushed  
> completely to disk right way. If you can't tolerate the possible  
> loss of recent updates, you can use the "full commit" option for  
> ACID commits.

So this bring up a question.  There is no way to get the same  
durability semantics on linux as you can get on OS X with F_FULLSYNC.   
This means that the "full commit" option really gives you different  
levels of durability, depending on whether or not you are on OS X.

And thinking more about what appears to be the perf bug/slowdown in  
CouchDB code, might his warrant three options?

1) delayed commit (what you did last night)
2) fsync() commit (what I suspect Couch did on and around 0.8)
3) optional F_FULLSYNC commit, on OS X and any other platform that  
provides this level of commit

>
>
> For full acid commit, add a header field to the doc PUT or  
> _bulk_docs POST like this:
> X-Couch-Full-Commit:true
>
> Then couchdb will completely commit the change before returning.


That's cool but it puts the burden on the client - why not make it a  
config option, so that the db admin can choose the durability level in  
general, and let clients that know they are talking to couch override  
w/ a header?


geir

>
>
> Also, if you have several delayed updates and you want to make sure  
> they all made it to disk, you can invoke POST /db/ 
> _ensure_full_commit and all outstanding commits are flushed to disk.
>
> The view engine has been already modified to deal with delayed  
> commits too, it ensures it never fully commits it's own indexes to  
> disk if the documents indexed aren't already committed to disk.
>
> The last remaining work item is db server crash detection, so that  
> clients can detect when a server has crashed and potentially lost  
> updates. This is pretty simple, each db server just needs a unique  
> ID generated at it's startup. Client retrieve this value at the  
> beginning of the writes and then checks that the value is the same  
> once down a flushed to disk. If not, we know we maybe have lost some  
> updates and we redo the replication from the last known good commit.
>
> Right now the default is to delay the commits, because I think that  
> will be the most common use case but I'm really not sure. I  
> definitely want the commits delayed for the test suite, to keep  
> things running fast.
>
> -Damien
>
>

Re: Faster updates, optional ACID

Posted by Jan Lehnardt <ja...@apache.org>.

On 5 Jan 2009, at 18:54, Damien Katz wrote:
> The last remaining work item is db server crash detection, so that  
> clients can detect when a server has crashed and potentially lost  
> updates. This is pretty simple, each db server just needs a unique  
> ID generated at it's startup. Client retrieve this value at the  
> beginning of the writes and then checks that the value is the same  
> once down a flushed to disk. If not, we know we maybe have lost some  
> updates and we redo the replication from the last known good commit.

We should document that pattern for clients that use CouchDB
and put in data without replication. Their writes might also not
go in.


> Right now the default is to delay the commits, because I think that  
> will be the most common use case but I'm really not sure. I  
> definitely want the commits delayed for the test suite, to keep  
> things running fast.

+1

Cheers
Jan
--

Re: Faster updates, optional ACID

Posted by "Geir Magnusson Jr." <ge...@pobox.com>.

On Jan 5, 2009, at 12:54 PM, Damien Katz wrote:

> It was brought to my attention that commits on OS X were very slow  
> with the latest releases of Erlang. After I upgraded to the most  
> recent version, I found them to be indeed slow, slowing the tests  
> down to point it was painful to run them. It appears, any disk sync  
> from Erlang now takes somewhere between 50 and 100ms, up from the  
> previous times of ~5ms. This is almost certainly due to the  
> F_FULLFSYNC flag the erlang file handling now uses on darwin based  
> systems, but it's surprising how bad performance on OS X. A little  
> investigation had shown other database engines have similar issues  
> on OS X.

One user, though,  reported a huge performance difference between 0.8  
and trunk *using the same version of Erlang*.

http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3C49619897.7000104@gmail.com%3E

To me, this hints that something changed in CouchDB that triggers  
usage of fcntl(F_FULLFSYNC).

I haven't tried to duplicate but will.  If this can be verified, I  
think that this is an important mystery to solve.

>
>
> To address this problem, I implemented delayed commit functionality.  
> We had always intended to implement delayed commit for performance  
> reasons, but hadn't had the need until now. This makes updates much  
> faster in the general case, but with the caveat they aren't flushed  
> completely to disk right way. If you can't tolerate the possible  
> loss of recent updates, you can use the "full commit" option for  
> ACID commits.

So this bring up a question.  There is no way to get the same  
durability semantics on linux as you can get on OS X with F_FULLSYNC.   
This means that the "full commit" option really gives you different  
levels of durability, depending on whether or not you are on OS X.

And thinking more about what appears to be the perf bug/slowdown in  
CouchDB code, might his warrant three options?

1) delayed commit (what you did last night)
2) fsync() commit (what I suspect Couch did on and around 0.8)
3) optional F_FULLSYNC commit, on OS X and any other platform that  
provides this level of commit

>
>
> For full acid commit, add a header field to the doc PUT or  
> _bulk_docs POST like this:
> X-Couch-Full-Commit:true
>
> Then couchdb will completely commit the change before returning.


That's cool but it puts the burden on the client - why not make it a  
config option, so that the db admin can choose the durability level in  
general, and let clients that know they are talking to couch override  
w/ a header?


geir

>
>
> Also, if you have several delayed updates and you want to make sure  
> they all made it to disk, you can invoke POST /db/ 
> _ensure_full_commit and all outstanding commits are flushed to disk.
>
> The view engine has been already modified to deal with delayed  
> commits too, it ensures it never fully commits it's own indexes to  
> disk if the documents indexed aren't already committed to disk.
>
> The last remaining work item is db server crash detection, so that  
> clients can detect when a server has crashed and potentially lost  
> updates. This is pretty simple, each db server just needs a unique  
> ID generated at it's startup. Client retrieve this value at the  
> beginning of the writes and then checks that the value is the same  
> once down a flushed to disk. If not, we know we maybe have lost some  
> updates and we redo the replication from the last known good commit.
>
> Right now the default is to delay the commits, because I think that  
> will be the most common use case but I'm really not sure. I  
> definitely want the commits delayed for the test suite, to keep  
> things running fast.
>
> -Damien
>
>