You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by fana <fa...@2flub.org> on 2009/10/29 07:28:33 UTC

What happens with a document, if a conflict is not resolved?

Hi,

I read the book, Wiki and some Blogs about CouchDB,
but there is still a question in my mind.

If a document is in conflict, the application has to resolve it.
But what, if this never happens?
Can the document in conflict still be read and edited?
Or is it unavailable until the conflict is resolved?

Re: What happens with a document, if a conflict is not resolved?

Posted by Damien Katz <da...@apache.org>.
On Nov 1, 2009, at 4:29 PM, Brian Candler wrote:

> On Fri, Oct 30, 2009 at 06:46:24AM -0400, Damien Katz wrote:
>>> * Application writer curses couchdb, and curses the person who wrote
>>> "Most applications require no special planning to take advantage of
>>> distributed updates and replication".
>>
>> It sounds like the dev didn't read the documentation.
>
> I believe the documentation is weak in this area, but I am  
> attempting to
> improve it.

I forgot in my last response, but I've noticed and thank you very  
much :)

-Damien

Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Sun, Nov 01, 2009 at 05:46:19PM -0500, Damien Katz wrote:
> I'm guessing you are taking some issue with the wording off this on the 
> website. "Most applications require no special planning to take  
> advantage of distributed updates and replication." from here: 
> http://couchdb.apache.org/docs/intro.html

Yes, also this:
"Using just the basic replication model, many traditionally single server
database applications can be made distributed with almost no extra work"
http://couchdb.apache.org/docs/overview.html

This language is vague and unquantifiable. For example, I could easily argue
that "many" traditional database applications rely on SQL transaction
semantics, and such applications would require substantial rework to move to
couchdb with multi-master replication.

Don't get me wrong. I love CouchDB, and its incremental map-reduce is
without compare. Its replication model is logical and sound. But it does
take work to make use of it properly.

> For apps that are document oriented, I think this claim is true. I  
> suppose we could say "many" instead of "most" for less controversial  
> wording.
>
> However, for "many" apps there is no need for special planning, it's  
> enough to present the conflicting docs to the user

Right. So you're saying that the user interface should know about conflicts
and present them to the user, and I'd agree that's the minimum level of
conflict handling which is required of a multi-master application.

That requires writing your application so that it:
- fetches documents with GET ?conflicts=true
- iterates GET to fetch the others if _conflicts member is present
- displays them to the user
- allows the user to edit the final version
- saves the final version back and delete the conflicts

I consider writing your application this way "special planning" because new
users are discouraged from following this path.

- By default, couchdb hides conflicts in the HTTP API

- The experience they will get using PUT is that conflicts don't arise in
the first place, and this is also advertised as a feature:
http://couchdb.apache.org/docs/overview.html

| Document edits
| are made by client applications loading documents, applying changes, and
| saving them back to the database. If another client editing the same
| document saves their changes first, the client gets an edit conflict error
| on save. To resolve the update conflict, the latest document version can be
| opened, the edits reapplied and the update tried again.

Of course, what is written is completely true. What is unsaid is that this
feature is incompatible with multi-master replication.

I think all I'm asking for is:
- the overview documentation should be explicit that there are *two*
  mechanisms for dealing with conflicting updates
- make it clear that if you only implement the first/simple/obvious
  one, your application will not be multi-master capable

Regards,

Brian.

Re: What happens with a document, if a conflict is not resolved?

Posted by Damien Katz <da...@apache.org>.
On Nov 1, 2009, at 4:29 PM, Brian Candler wrote:

> On Fri, Oct 30, 2009 at 06:46:24AM -0400, Damien Katz wrote:
>>> * Application writer curses couchdb, and curses the person who wrote
>>> "Most applications require no special planning to take advantage of
>>> distributed updates and replication".
>>
>> It sounds like the dev didn't read the documentation.
>
> I believe the documentation is weak in this area, but I am  
> attempting to
> improve it.
>
> I think it's fair to say that the first documentation anyone reads  
> about
> CouchDB screams out: "RESTful HTTP API". Most people associate  
> "REST" with
> GET and PUT. This in turn leads them down the path of writing apps  
> which use
> PUT, and handling conflicts by trapping 409 responses and retrying  
> at write
> time.
>
> It works at first, but this approach will not serve them when they  
> get to
> distributed updates and replication. At very least, they'll need  
> another
> layer of conflict resolution, and that could involve substantial  
> application
> changes.
>
> With some "special planning" they could have decided to use the  
> somewhat
> obscure POST-to-_bulk_docs-with-all_or_nothing-true instead of PUT,  
> and
> implemented conflict resolution just once in a way which *does* also  
> work
> with distributed updates and replication.

I'm guessing you are taking some issue with the wording off this on  
the website. "Most applications require no special planning to take  
advantage of distributed updates and replication." from here: http://couchdb.apache.org/docs/intro.html

For apps that are document oriented, I think this claim is true. I  
suppose we could say "many" instead of "most" for less controversial  
wording.

However, for "many" apps there is no need for special planning, it's  
enough to present the conflicting docs to the user with simple diffing  
tools. This detection and handling is can be added after the app is  
built and is sufficient for many common use cases.

>
>> Patches are welcome, and most everything you propose could be done in
>> front end that's not that involved.
>
> Some of it can be done in a front-end.
>
> Reviewing my own code, most often I retrieve documents by using a  
> multi-key
> fetch against a view (either _all_docs or a user view). Writing a  
> front-end
> to that which also gives you the information needed to resolve  
> conflicts is
> hard - I think you'd end up having to GET each document  
> individually, which
> rather defeats the object of include_docs=true. I've raised that as
> COUCHDB-549.

Thanks.

>
> Regards,
>
> Brian.


Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Fri, Oct 30, 2009 at 06:46:24AM -0400, Damien Katz wrote:
>> * Application writer curses couchdb, and curses the person who wrote
>>  "Most applications require no special planning to take advantage of
>>  distributed updates and replication".
>
> It sounds like the dev didn't read the documentation.

I believe the documentation is weak in this area, but I am attempting to
improve it.

I think it's fair to say that the first documentation anyone reads about
CouchDB screams out: "RESTful HTTP API". Most people associate "REST" with
GET and PUT. This in turn leads them down the path of writing apps which use
PUT, and handling conflicts by trapping 409 responses and retrying at write
time.

It works at first, but this approach will not serve them when they get to
distributed updates and replication. At very least, they'll need another
layer of conflict resolution, and that could involve substantial application
changes.

With some "special planning" they could have decided to use the somewhat
obscure POST-to-_bulk_docs-with-all_or_nothing-true instead of PUT, and
implemented conflict resolution just once in a way which *does* also work
with distributed updates and replication.

> Patches are welcome, and most everything you propose could be done in  
> front end that's not that involved.

Some of it can be done in a front-end.

Reviewing my own code, most often I retrieve documents by using a multi-key
fetch against a view (either _all_docs or a user view). Writing a front-end
to that which also gives you the information needed to resolve conflicts is
hard - I think you'd end up having to GET each document individually, which
rather defeats the object of include_docs=true. I've raised that as
COUCHDB-549.

Regards,

Brian.

Re: What happens with a document, if a conflict is not resolved?

Posted by Damien Katz <da...@apache.org>.
On Oct 30, 2009, at 4:33 AM, Brian Candler wrote:

> On Thu, Oct 29, 2009 at 01:51:57PM -0400, Damien Katz wrote:
>>> Is this a sensible API? You decide. I've given my opinion  
>>> previously.
>>
>>
>> This api seems weird, but it's the closest thing we can have to  
>> multi-
>> document transactions in CouchDB and be a distributed, partitioned
>> database. This is because it's pretty much impossible to support all-
>> or-nothing conflict checking transactions with partitioned database
>> without some sort of double-lock checking, which is slow and  
>> expensive.
>
> I don't want to prevent conflicts, nor do I want transactions. As  
> you say,
> introducing conflicting revisions is a fact of life in a distributed- 
> master
> system.
>
> However, I believe that CouchDB's API actively discourages people from
> writing apps which deal with conflicts properly, by (a) hiding them,  
> and (b)
> making resolve-on-read a multi-step process (e.g. readA, readB, readC,
> writeA, deleteB, deleteC) which itself is race-prone and may lead to  
> more
> conflicts and odd intermediate states (*)

This is true if the conflicts are being resolved on more than one  
node. You can't avoid this.

>
> What I would like to see is the following.
>
> 1. When you request document X, you get *all* conflicting revisions  
> in one
>   go. That is, they are treated as equal peers; none is promoted to  
> winner.
>
>   (However, the list can be sorted in a deterministic order, so you  
> could
>   get the current behaviour by just picking the first revision from  
> the
>   list)
>
> 2. When you perform this request, you get a single "context" tag
>   which identifies this particular *set* of revisions.
>
> 3. When you write back the new document, you supply the context tag,  
> and
>   this simultaneously supercedes all the other documents.  
> Effectively this
>   would be like the _rev you use today, but it would refer to the set.
>   It could actually just be an array of _revs, but the user should  
> treat
>   it as an opaque tag.
>
> 4. Views get to see the whole set of revisions too. Again, if they  
> want
>   today's behaviour they can just use docs[0] and ignore the others;  
> but
>   if they want to resolve conflicts they can too.
>
> 5. If two clients replace a document or set of conflicts with a new
>   document, and the new documents are identical, then they are not
>   treated as conflicts.
>
> When reading papers on systems like Dynamo, they all seem to have  
> properties
> (1)-(3). That is: it's treated as natural that conflicts should  
> arise; that
> these are fully exposed to the client; and the client is given the
> opportunity to resolve them in a single step.

That's a matter of opinion. It sounds more difficult form a client  
perspective to me to have to deal with conflicts on every read  
operation.

>
>> If you want an easier API for saving documents into a conflicted  
>> state
>> (something like ?conflict=ok), that would be a fairly easy patch to
>> make. But I'm not sure why users would want that for a single  
>> document.
>
> I think that ultimately the 409 behaviour could be dropped if  
> conflicts were
> handled as above, but that's not my number one concern.
>
> My concern is this:
>
> * Someone writes an application
>
> * They use the "obvious" API: i.e. simple GET and PUT for reading and
>  updating documents. They code to the 409 for avoiding conflicts. It  
> all
>  works fine and they are delighted with couchdb.
>
> * They switch to multi-master
>
> * All hell breaks lose. Users see their docs vanishing. Application  
> writer
>  finally works out how to do conflict management properly, and has to
>  rewrite the app entirely so that (for example) one GET becomes a
>  GET with ?conflicts=true, followed by multiple GETs for the  
> additional
>  versions, followed by conflict resolution followed by a POST
>  to _bulk_docs to replace the original document and conflicts.
>
> * Application writer curses couchdb, and curses the person who wrote
>  "Most applications require no special planning to take advantage of
>  distributed updates and replication".

It sounds like the dev didn't read the documentation.

>
> Yes, I know patches are welcome. The reason I'm not contributing  
> code for
> this right now is that I have higher priorities - I'm happy to keep  
> my app
> 409-tied while I work on other things. But at the back of my mind, I  
> know
> that I won't be going multi-master for a long time, if ever.

Patches are welcome, and most everything you propose could be done in  
front end that's not that involved.

>
> Regards,
>
> Brian.
>
> (*) Yes, I know that *with care* you can do the writes and deletes  
> together
> as a single _bulk_docs operation, and even bind them together using
> "all_or_nothing":true. But this is not obvious. And there are still  
> races.
> For example, I'm not sure that you can use a multi-key fetch for  
> getting all
> the conflicting revisions in one hit, so you have a series of GETs,  
> and you
> may find that the revs you're GETting have vanished by the time you  
> read
> them.


Re: What happens with a document, if a conflict is not resolved?

Posted by Devon Weller <dw...@devonweller.com>.
I second that - thanks Brian.  I am a new CouchDB user and this was  
something I was still unclear about.  Yeah for more documentation!

- Devon

On Oct 30, 2009, at 8:40 AM, Freddy Bowen wrote:

> Brian, that wiki page is great - a thoughtful review of the issue.   
> Thanks
> for putting it all in one place for me to read and reference!
>
> FB
>
>> On Fri, Oct 30, 2009 at 08:33:52AM +0000, Brian Candler wrote:
>> On the other hand, I am happy to contribute some documentation. I  
>> just
>> wrote the following page, which turned out to be longer than  
>> expected:
>> http://wiki.apache.org/couchdb/Replication_and_conflicts


Re: What happens with a document, if a conflict is not resolved?

Posted by Freddy Bowen <fr...@gmail.com>.
Brian, that wiki page is great - a thoughtful review of the issue.  Thanks
for putting it all in one place for me to read and reference!

FB


On Fri, Oct 30, 2009 at 9:15 AM, Brian Candler <B....@pobox.com> wrote:

> On Fri, Oct 30, 2009 at 08:33:52AM +0000, Brian Candler wrote:
> > Yes, I know patches are welcome. The reason I'm not contributing code for
> > this right now is that I have higher priorities - I'm happy to keep my
> app
> > 409-tied while I work on other things.
>
> On the other hand, I am happy to contribute some documentation. I just
> wrote the following page, which turned out to be longer than expected:
> http://wiki.apache.org/couchdb/Replication_and_conflicts
>
> Some interesting things came out while working this through. You can see an
> example Ruby script I wrote which replaces GET and PUT with multi-rev
> equivalents. I don't know of any existing client library which does this,
> but once you're clear what is needed, it's quite straightforward.
>
> Then if you use this, your application can forget about all 409 handling,
> because you'll never see one.
>
> I think turning this into a native API would be good - even if it's just
> GET ?all=true and PUT ?revs=[rev1,rev2,rev3]
>
> Regards,
>
> Brian.
>

Re: What happens with a document, if a conflict is not resolved?

Posted by Damien Katz <da...@apache.org>.
Please feel free to help make this better. All the information you  
want is available on the erlang side, it just sounds like you need it  
exposed in a slightly different way. At the very least write up a  
proposal of how the API should work and file a bug report. Even better  
is to provide a patch in the same bug.

-Damien

On Nov 1, 2009, at 6:27 AM, Brian Candler wrote:

> On Fri, Oct 30, 2009 at 08:37:36PM -0400, Adam Kocoloski wrote:
>> The response format is a slightly awkward Array -- I believe the  
>> first
>> revision is the winning one.
>
> This is definitely not true. For example, I can get the database  
> into a
> state where there are three conflicting versions, and the winning  
> version
> is the middle one returned by open_revs=all
>
> $ curl http://127.0.0.1:5984/conflict_test/test?conflicts=true
> {"_id":"test","_rev":"10- 
> ca7d1f8bc31d20ad4898e4024547c4c6","hello":"baz","_conflicts": 
> ["9 
> -9b4928288c4fc3087833549752bdcdc3 
> ","9-2438fa9601119c8fe2c0b2f4a688f65f"]}
> $ curl http://127.0.0.1:5984/conflict_test/test?open_revs=all
> [{"ok": 
> {"_id 
> ":"test","_rev":"9-2438fa9601119c8fe2c0b2f4a688f65f","hello":"bar"}},
> {"ok":{"_id":"test","_rev":"10- 
> ca7d1f8bc31d20ad4898e4024547c4c6","hello":"baz"}},
> {"ok": 
> {"_id 
> ":"test","_rev":"9-9b4928288c4fc3087833549752bdcdc3","hello":"foo"}}]
>
> Code to do this is attached.
>
> Regards,
>
> Brian.
> <conflict-openrevs-order.rb>


Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Fri, Oct 30, 2009 at 08:37:36PM -0400, Adam Kocoloski wrote:
> The response format is a slightly awkward Array -- I believe the first  
> revision is the winning one.

This is definitely not true. For example, I can get the database into a
state where there are three conflicting versions, and the winning version
is the middle one returned by open_revs=all

$ curl http://127.0.0.1:5984/conflict_test/test?conflicts=true
{"_id":"test","_rev":"10-ca7d1f8bc31d20ad4898e4024547c4c6","hello":"baz","_conflicts":["9-9b4928288c4fc3087833549752bdcdc3","9-2438fa9601119c8fe2c0b2f4a688f65f"]}
$ curl http://127.0.0.1:5984/conflict_test/test?open_revs=all
[{"ok":{"_id":"test","_rev":"9-2438fa9601119c8fe2c0b2f4a688f65f","hello":"bar"}},
{"ok":{"_id":"test","_rev":"10-ca7d1f8bc31d20ad4898e4024547c4c6","hello":"baz"}},
{"ok":{"_id":"test","_rev":"9-9b4928288c4fc3087833549752bdcdc3","hello":"foo"}}]

Code to do this is attached.

Regards,

Brian.

Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Fri, Oct 30, 2009 at 08:37:36PM -0400, Adam Kocoloski wrote:
> I like where your head's at on this, Brian.  I should mention that it  
> *is* possible to retrieve all conflict revisions of a document with one 
> request:
>
> GET /db/bob?open_revs=all

Unfortunately, open_revs=all opens more than just the current conflicting
revisions, and the live one is not the first. For example, after branching
to three conflicting revisions and then merging them back into one, I get

  [{"ok": {"_deleted"=>true, ...},
   {"ok": {"_deleted"=>true, ...},
   {"ok": {...current doc...}

This is true after compaction as well. The attached program demonstrates
this.

Do we need a version of get_all_leafs which excludes _deleted members?

> The response format is a slightly awkward Array -- I believe the first  
> revision is the winning one.
>
> [{"ok":{"_id":"bob","_rev":"1-3453545",...}},{"ok": 
> {"_id":"bob","_rev":"1-23042"]

The "ok" tag isn't hard to strip, so this is just a minor annoyance. At
first I couldn't see why it was there at all, but it turns out that
open_revs also lets you list revisions explicitly:

$ curl 'http://127.0.0.1:5984/conflict_test/test?open_revs=%5b%221-2345%22%5d'
[{"missing":"1-2345"}]

A simpler API might be to return a _missing member for the requested doc
itself, similar to _deleted, e.g.
[{"_id":"test","_rev":"1-2345","_missing":true}]

> I think I'd be in favor of making the default GET include all conflicts, 
> but probably in the _conflicts field so as to minimize the changes to the 
> current API.  I'm not sure a multi-rev version of PUT is as urgent a 
> need.

The only downside of the current _bulk_docs way of resolving a conflict is
that it is asymmetrical. If you have (say) three conflicting revisions of a
document, then you have to update one and delete the other two, rather than
supercede all three. So you have to pick one as somehow "more important"
than the other two, or follow couchdb's arbitrary choice.

This might be an artefact of couchdb's revision history mechanism: I would
guess that it does not allow multiple ancestors of a single revision
(unlike, say, git, which lets you have a merge commit with multiple parents)

A quick test suggests this is true. If I set up the following conflict
sequence:

         ,--> r2a --> r3
       r1 --> r2b --> (deleted)
         `--> r2c --> (deleted)

then I query r3 with revs_info=yes, then I see only the linear sequence
r1-r2a-r3.

Regards,

Brian.

Re: What happens with a document, if a conflict is not resolved?

Posted by ericdes <er...@vcardprocessor.com>.
Yes, I thought about getting the conflict state while querying multiple 
documents, event without the include_docs=true. It would be interesting 
to see how expensive such a request would be.

Alternatively a cheap way would be to flag a document as 'possibly 
conflicting' when a branch is created (i.e. an update made on the same 
revision) -- or isn't it just the case: merges are not done 
automatically, are they?.

Eric.

> I'm not sure if you mean getting the conflict state without getting the
> entire document (i.e. without include_docs=true). It would be possible, but
> I don't know how expensive it would be. Perhaps couchdb would have to do as
> much tree walking to find the conflicting revs as it would to retrieve the
> entire document anyway.
>
> Regards,
>
> Brian.



Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Sun, Nov 08, 2009 at 12:13:27PM +0530, ericdes wrote:
> I'm currently  
> writing a .Net driver for CouchDB, and my point of view about making the  
> default GET include all conflicts is that it is not a bad idea at all  
> (even ideal), although a driver implementation may 'hide' that by  
> implementing a GetDocument method that would add the query option by  
> default.

I agree, this is something which client libraries can wrap and hide,
although I think it could be cleaner if there were a slightly different form
of open_revs=all (see COUCHDB-548)

> I think it's more annoying not to have the possibility of using the  
> conflict query 'conflicts=true' when requesting multiple documents (I  
> mean especially when you don't want to include the docs).

You could add a vote to
https://issues.apache.org/jira/browse/COUCHDB-549

I'm not sure if you mean getting the conflict state without getting the
entire document (i.e. without include_docs=true). It would be possible, but
I don't know how expensive it would be. Perhaps couchdb would have to do as
much tree walking to find the conflicting revs as it would to retrieve the
entire document anyway.

Regards,

Brian.

Re: What happens with a document, if a conflict is not resolved?

Posted by ericdes <er...@vcardprocessor.com>.
Brian,

I read your posts about conflicts with great interest. I'm currently 
writing a .Net driver for CouchDB, and my point of view about making the 
default GET include all conflicts is that it is not a bad idea at all 
(even ideal), although a driver implementation may 'hide' that by 
implementing a GetDocument method that would add the query option by 
default.

I think it's more annoying not to have the possibility of using the 
conflict query 'conflicts=true' when requesting multiple documents (I 
mean especially when you don't want to include the docs). For example, 
this could be used by a CouchDB explorer type application that would 
retrieve all record metadata and signal the conflicts with a specific 
icon (like tortoise SVN does when a document is out of sync for example).

Eric Desgranges.


On 11/1/2009 5:23 PM, Brian Candler wrote:
> On Sat, Oct 31, 2009 at 10:40:43AM +0000, Brian Candler wrote:
>>> I think I'd be in favor of making the default GET include all conflicts,
>>> but probably in the _conflicts field so as to minimize the changes to the
>>> current API.
>>
>> As a first step, that would be unlikely to break anyone, would make it clear
>> when a conflict exists, and would simplify the API by removing the need for
>> ?conflicts=true. So I'd vote for that.
>
> Also: if you query a view with ?include_docs=true, at the moment you never
> get a _conflicts member in the document, even if you say
> ?include_docs=true&conflicts=true
>
> Perhaps you should?
>
> B.
>


Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Sat, Oct 31, 2009 at 10:40:43AM +0000, Brian Candler wrote:
> > I think I'd be in favor of making the default GET include all conflicts, 
> > but probably in the _conflicts field so as to minimize the changes to the 
> > current API.
> 
> As a first step, that would be unlikely to break anyone, would make it clear
> when a conflict exists, and would simplify the API by removing the need for
> ?conflicts=true. So I'd vote for that.

Also: if you query a view with ?include_docs=true, at the moment you never
get a _conflicts member in the document, even if you say
?include_docs=true&conflicts=true

Perhaps you should?

B.

Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Fri, Oct 30, 2009 at 08:37:36PM -0400, Adam Kocoloski wrote:
> I like where your head's at on this, Brian.  I should mention that it  
> *is* possible to retrieve all conflict revisions of a document with one 
> request:
>
> GET /db/bob?open_revs=all

Ah, that's a new one to me. I will update the wiki when I get a chance.

As Damian pointed out, you could implement resolve-on-get in a client layer
in front of CouchDB. The problem is, now I look over my own code, is that I
don't actually use a plain GET very much at all. Usually I fetch documents
via a view and multi-key fetch.

I tried _all_docs?conflicts=true and I didn't get the conflicts (I haven't
tried _all_docs?open_revs=all yet). Maybe this doesn't make sense until the
view itself can receive all revs and decide how to merge the conflicts.

> I think I'd be in favor of making the default GET include all conflicts, 
> but probably in the _conflicts field so as to minimize the changes to the 
> current API.

As a first step, that would be unlikely to break anyone, would make it clear
when a conflict exists, and would simplify the API by removing the need for
?conflicts=true. So I'd vote for that.

Regards,

Brian.

Re: What happens with a document, if a conflict is not resolved?

Posted by Adam Kocoloski <ko...@apache.org>.
On Oct 30, 2009, at 9:15 AM, Brian Candler wrote:

> On Fri, Oct 30, 2009 at 08:33:52AM +0000, Brian Candler wrote:
>> Yes, I know patches are welcome. The reason I'm not contributing  
>> code for
>> this right now is that I have higher priorities - I'm happy to keep  
>> my app
>> 409-tied while I work on other things.
>
> On the other hand, I am happy to contribute some documentation. I just
> wrote the following page, which turned out to be longer than expected:
> http://wiki.apache.org/couchdb/Replication_and_conflicts
>
> Some interesting things came out while working this through. You can  
> see an
> example Ruby script I wrote which replaces GET and PUT with multi-rev
> equivalents. I don't know of any existing client library which does  
> this,
> but once you're clear what is needed, it's quite straightforward.
>
> Then if you use this, your application can forget about all 409  
> handling,
> because you'll never see one.
>
> I think turning this into a native API would be good - even if it's  
> just
> GET ?all=true and PUT ?revs=[rev1,rev2,rev3]
>
> Regards,
>
> Brian.

I like where your head's at on this, Brian.  I should mention that it  
*is* possible to retrieve all conflict revisions of a document with  
one request:

GET /db/bob?open_revs=all

The response format is a slightly awkward Array -- I believe the first  
revision is the winning one.

[{"ok":{"_id":"bob","_rev":"1-3453545",...}},{"ok": 
{"_id":"bob","_rev":"1-23042"]

I think I'd be in favor of making the default GET include all  
conflicts, but probably in the _conflicts field so as to minimize the  
changes to the current API.  I'm not sure a multi-rev version of PUT  
is as urgent a need.  Best,

Adam


Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Fri, Oct 30, 2009 at 08:33:52AM +0000, Brian Candler wrote:
> Yes, I know patches are welcome. The reason I'm not contributing code for
> this right now is that I have higher priorities - I'm happy to keep my app
> 409-tied while I work on other things.

On the other hand, I am happy to contribute some documentation. I just
wrote the following page, which turned out to be longer than expected:
http://wiki.apache.org/couchdb/Replication_and_conflicts

Some interesting things came out while working this through. You can see an
example Ruby script I wrote which replaces GET and PUT with multi-rev
equivalents. I don't know of any existing client library which does this,
but once you're clear what is needed, it's quite straightforward.

Then if you use this, your application can forget about all 409 handling,
because you'll never see one.

I think turning this into a native API would be good - even if it's just
GET ?all=true and PUT ?revs=[rev1,rev2,rev3]

Regards,

Brian.

Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Thu, Oct 29, 2009 at 01:51:57PM -0400, Damien Katz wrote:
>> Is this a sensible API? You decide. I've given my opinion previously.
>
>
> This api seems weird, but it's the closest thing we can have to multi- 
> document transactions in CouchDB and be a distributed, partitioned  
> database. This is because it's pretty much impossible to support all- 
> or-nothing conflict checking transactions with partitioned database  
> without some sort of double-lock checking, which is slow and expensive. 

I don't want to prevent conflicts, nor do I want transactions. As you say,
introducing conflicting revisions is a fact of life in a distributed-master
system.

However, I believe that CouchDB's API actively discourages people from
writing apps which deal with conflicts properly, by (a) hiding them, and (b)
making resolve-on-read a multi-step process (e.g. readA, readB, readC,
writeA, deleteB, deleteC) which itself is race-prone and may lead to more
conflicts and odd intermediate states (*)

What I would like to see is the following.

1. When you request document X, you get *all* conflicting revisions in one
   go. That is, they are treated as equal peers; none is promoted to winner.

   (However, the list can be sorted in a deterministic order, so you could
   get the current behaviour by just picking the first revision from the
   list)

2. When you perform this request, you get a single "context" tag
   which identifies this particular *set* of revisions.

3. When you write back the new document, you supply the context tag, and
   this simultaneously supercedes all the other documents. Effectively this
   would be like the _rev you use today, but it would refer to the set.
   It could actually just be an array of _revs, but the user should treat
   it as an opaque tag.

4. Views get to see the whole set of revisions too. Again, if they want
   today's behaviour they can just use docs[0] and ignore the others; but
   if they want to resolve conflicts they can too.

5. If two clients replace a document or set of conflicts with a new
   document, and the new documents are identical, then they are not
   treated as conflicts.

When reading papers on systems like Dynamo, they all seem to have properties
(1)-(3). That is: it's treated as natural that conflicts should arise; that
these are fully exposed to the client; and the client is given the
opportunity to resolve them in a single step.

> If you want an easier API for saving documents into a conflicted state  
> (something like ?conflict=ok), that would be a fairly easy patch to  
> make. But I'm not sure why users would want that for a single document.

I think that ultimately the 409 behaviour could be dropped if conflicts were
handled as above, but that's not my number one concern.

My concern is this:

* Someone writes an application

* They use the "obvious" API: i.e. simple GET and PUT for reading and
  updating documents. They code to the 409 for avoiding conflicts. It all
  works fine and they are delighted with couchdb.

* They switch to multi-master

* All hell breaks lose. Users see their docs vanishing. Application writer
  finally works out how to do conflict management properly, and has to
  rewrite the app entirely so that (for example) one GET becomes a
  GET with ?conflicts=true, followed by multiple GETs for the additional
  versions, followed by conflict resolution followed by a POST
  to _bulk_docs to replace the original document and conflicts.

* Application writer curses couchdb, and curses the person who wrote
  "Most applications require no special planning to take advantage of
  distributed updates and replication".

What I propose above could be introduced incrementally with suitable flags,
but care would be needed to do this everywhere [e.g. not just simple GET but
also multi-key fetches]. Of course there's a lot of detail that would need
to be worked through.

Yes, I know patches are welcome. The reason I'm not contributing code for
this right now is that I have higher priorities - I'm happy to keep my app
409-tied while I work on other things. But at the back of my mind, I know
that I won't be going multi-master for a long time, if ever.

Regards,

Brian.

(*) Yes, I know that *with care* you can do the writes and deletes together
as a single _bulk_docs operation, and even bind them together using
"all_or_nothing":true. But this is not obvious. And there are still races.
For example, I'm not sure that you can use a multi-key fetch for getting all
the conflicting revisions in one hit, so you have a series of GETs, and you
may find that the revs you're GETting have vanished by the time you read
them.

Re: What happens with a document, if a conflict is not resolved?

Posted by Damien Katz <da...@apache.org>.
On Oct 29, 2009, at 12:30 PM, Brian Candler wrote:

> On Thu, Oct 29, 2009 at 07:28:33AM +0100, fana wrote:
>> I read the book, Wiki and some Blogs about CouchDB,
>> but there is still a question in my mind.
>>
>> If a document is in conflict, the application has to resolve it.
>> But what, if this never happens?
>
> All the conflicting versions remain around, even through compaction.  
> However
> if you request a document by ID, by default you will get an arbitrary
> revision. The algorithm is the same across all nodes, so all nodes  
> will see
> the same. The "winning" document is also the one seen by views.
>
>> Can the document in conflict still be read and edited?
>
> Yes. Conflicts branch into a tree. When you've resolved a conflict,  
> you need
> to delete the conflicting revisions explicitly.
>
> Example:
>
>    X0
>
> User 1 fetches X0 and updates it to X1. User 2 fetches X0 and  
> updates it to
> X2. Then you get:
>
>      ,-> X1
>    X0
>      `-> X2
>
> If either user reads, they will see one of the versions (say X1).  
> They won't
> even know that there's a conflict unless they query with ? 
> conflicts=true, in
> which case they'll see the rev of X2 as well, but would need to do a  
> second
> read to get the contents of X2.
>
> If the database is compacted then the common ancestor X0 will be lost
> forever, but X1 and X2 will still remain. (Hence you can't rely on  
> doing a
> diff between X0 and X1, and another diff between X0 and X2, to merge  
> the
> changes).

If you want DVCS like full diffing, then one way is to attach a diff  
and revision metadata of each edit before PUTing on a document. When  
there is a conflict, the revision history is completely available for  
inspection, and the user can see where the conflicting edit began, etc.

>
> If a user edits X1 and saves back as X3, you will get
>
>      ,-> X1 -> X3
>    X0
>      `-> X2
>
> Now X2 and X3 are in conflict. The conflict may be resolved in  
> favour of X3;
> actually, I don't know the details of the algorithm, so it might be  
> possible
> for it to be resolved in favour or X2, which means that the changes  
> seen in
> X1 and X3 would both appear to "vanish" at that point.

The one with more edits wins, which prevents the arbitrary  
disappearance of document from normal editing.

>
> Note: if you are running on a single node, then by default,  
> conflicting
> updates are forbidden with a 409 error. But you can get them in two  
> ways: by
> making the changes on two separate nodes and replicating the nodes  
> to each
> other; or by using the _bulk_docs API with {"all_or_nothing":true}.
>
> The second case is used in the following shell script, so this may  
> be a good
> starting point for experimentation.
>
> ---- 8< -------------
> HOST=http://127.0.0.1:5984
> DB="$HOST/conflict_test"
> EP="$DB/_bulk_docs"
> curl -s "$HOST"
> curl -sX DELETE "$DB"
> curl -sX PUT "$DB"
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "type":"test"
> }]}
> JSON
> rev0=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev0
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev0",
> "type":"test",
> "data":"foo"
> }]}
> JSON
> rev1=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev1
>
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev0",
> "type":"wibble",
> "data":"bar"
> }]}
> JSON
> rev2=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev2
>
> # Now we have two conflicting versions.
> echo
> echo "Getting the auto-selected version:"
> curl -s "$DB/mydoc"
> echo
> echo "Getting the auto-selected version with 'conflicts':"
> curl -s "$DB/mydoc?conflicts=true"
> echo
> echo "Getting the auto-selected version with 'revs_info':"
> curl -s "$DB/mydoc?revs_info=true"
>
> # Note that you would have to retrieve the conflicting versions  
> yourself
>
> echo "Now updating version $rev1"
> resp=$(curl -sX POST -d @- $EP <<JSON)
> {"all_or_nothing":true,"docs":[{
> "_id":"mydoc",
> "_rev":"$rev1",
> "type":"test",
> "data":"baz"
> }]}
> JSON
> rev3=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
> echo $rev3
>
> echo
> echo "Getting the auto-selected version:"
> curl -s "$DB/mydoc"
> echo
> echo "Getting the auto-selected version with 'conflicts':"
> curl -s "$DB/mydoc?conflicts=true"
> ---- 8< -------------
>
> Is this a sensible API? You decide. I've given my opinion previously.


This api seems weird, but it's the closest thing we can have to multi- 
document transactions in CouchDB and be a distributed, partitioned  
database. This is because it's pretty much impossible to support all- 
or-nothing conflict checking transactions with partitioned database  
without some sort of double-lock checking, which is slow and  
expensive. And also replication doesn't replicate transactions, only  
documents, so we don't wish to confuse users by introducing  
transactions that aren't supported by the rest of CouchDB.

If you want an easier API for saving documents into a conflicted state  
(something like ?conflict=ok), that would be a fairly easy patch to  
make. But I'm not sure why users would want that for a single document.

Thanks for this write up, you seem to have given a good high  
description how conflicts work in CouchDB.

-Damien


>
> HTH,
>
> Brian.


Re: What happens with a document, if a conflict is not resolved?

Posted by Brian Candler <B....@pobox.com>.
On Thu, Oct 29, 2009 at 07:28:33AM +0100, fana wrote:
> I read the book, Wiki and some Blogs about CouchDB,
> but there is still a question in my mind.
> 
> If a document is in conflict, the application has to resolve it.
> But what, if this never happens?

All the conflicting versions remain around, even through compaction. However
if you request a document by ID, by default you will get an arbitrary
revision. The algorithm is the same across all nodes, so all nodes will see
the same. The "winning" document is also the one seen by views.

> Can the document in conflict still be read and edited?

Yes. Conflicts branch into a tree. When you've resolved a conflict, you need
to delete the conflicting revisions explicitly.

Example:

    X0

User 1 fetches X0 and updates it to X1. User 2 fetches X0 and updates it to
X2. Then you get:

      ,-> X1
    X0
      `-> X2

If either user reads, they will see one of the versions (say X1). They won't
even know that there's a conflict unless they query with ?conflicts=true, in
which case they'll see the rev of X2 as well, but would need to do a second
read to get the contents of X2.

If the database is compacted then the common ancestor X0 will be lost
forever, but X1 and X2 will still remain. (Hence you can't rely on doing a
diff between X0 and X1, and another diff between X0 and X2, to merge the
changes).

If a user edits X1 and saves back as X3, you will get

      ,-> X1 -> X3
    X0
      `-> X2

Now X2 and X3 are in conflict. The conflict may be resolved in favour of X3;
actually, I don't know the details of the algorithm, so it might be possible
for it to be resolved in favour or X2, which means that the changes seen in
X1 and X3 would both appear to "vanish" at that point.

Note: if you are running on a single node, then by default, conflicting
updates are forbidden with a 409 error. But you can get them in two ways: by
making the changes on two separate nodes and replicating the nodes to each
other; or by using the _bulk_docs API with {"all_or_nothing":true}.

The second case is used in the following shell script, so this may be a good
starting point for experimentation.

---- 8< -------------
HOST=http://127.0.0.1:5984
DB="$HOST/conflict_test"
EP="$DB/_bulk_docs"
curl -s "$HOST"
curl -sX DELETE "$DB"
curl -sX PUT "$DB"

resp=$(curl -sX POST -d @- $EP <<JSON)
{"all_or_nothing":true,"docs":[{
"_id":"mydoc",
"type":"test"
}]}
JSON
rev0=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
echo $rev0

resp=$(curl -sX POST -d @- $EP <<JSON)
{"all_or_nothing":true,"docs":[{
"_id":"mydoc",
"_rev":"$rev0",
"type":"test",
"data":"foo"
}]}
JSON
rev1=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
echo $rev1

resp=$(curl -sX POST -d @- $EP <<JSON)
{"all_or_nothing":true,"docs":[{
"_id":"mydoc",
"_rev":"$rev0",
"type":"wibble",
"data":"bar"
}]}
JSON
rev2=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
echo $rev2

# Now we have two conflicting versions.
echo
echo "Getting the auto-selected version:"
curl -s "$DB/mydoc"
echo
echo "Getting the auto-selected version with 'conflicts':"
curl -s "$DB/mydoc?conflicts=true"
echo
echo "Getting the auto-selected version with 'revs_info':"
curl -s "$DB/mydoc?revs_info=true"

# Note that you would have to retrieve the conflicting versions yourself

echo "Now updating version $rev1"
resp=$(curl -sX POST -d @- $EP <<JSON)
{"all_or_nothing":true,"docs":[{
"_id":"mydoc",
"_rev":"$rev1",
"type":"test",
"data":"baz"
}]}
JSON
rev3=`expr "$resp" : '.*"rev":"\([^"]*\)"'`
echo $rev3

echo
echo "Getting the auto-selected version:"
curl -s "$DB/mydoc"
echo
echo "Getting the auto-selected version with 'conflicts':"
curl -s "$DB/mydoc?conflicts=true"
---- 8< -------------

Is this a sensible API? You decide. I've given my opinion previously.

HTH,

Brian.