You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Paul Hirst <pa...@sophos.com> on 2012/04/18 10:51:27 UTC

Making conflicts first class citizens

I saw this idea on allourideas.org:

"Conflicts as first class citizens: Surface the conflict on read, and always accept a write, assuming it passes validation."

I was wondering if anyone could expand on this?

On write, conflicts will be rejected at the moment which is really handy from a simplicity point of view and in many use cases it's a good enough solution. If you use the all_or_nothing:true option through the bulk API then you can currently write conflicting documents and this is (as I understand it) exactly what replication does.

So, is this idea, about changing the default behaviour to act as the all_or_nothing option? Does it get rid of the ability to detect and reject conflicts at write time? Lastly, why does anyone want it when we seem to have the best of both worlds at the moment?

________________________________
Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 991 2418 08.

Re: Making conflicts first class citizens

Posted by Matt Goodall <ma...@gmail.com>.
On 18 April 2012 10:03, Robert Newson <rn...@apache.org> wrote:
> Hi Paul,
>
> I expanded on it here: https://gist.github.com/2387973
>
> (As an aside, the all_or_nothing:true setting for bulk_docs is being
> deprecated.)
>
> The motivation for all these changes is that knowledge of conflicts,
> and how to handle them, with the current API ends up happening very
> late in the development cycle and, frequently, *post* development and
> into production and maintenance. We think it better to make the
> multi-master model, and its consequences, clearer and simpler to all
> up front. It's a great model but you need to understand it, hiding
> portions of it by default is a hindrance.

I'm really hoping this change goes ahead (I voted for it). Assuming
multi-master replication from the start is definitely a better
approach with CouchDB in my experience. (I've seriously considered
using _bulk_docs with all_or_nothing:true for *every* update to
achieve a similar effect.)

Another item in the giant todo list that will reduce the chance of
conflicts happening is partial updates of documents. In fact, I would
love to see the following implemented together:

  * Conflicts as first class citizens
  * Partial updates of documents (single doc and bulk docs API please ;-))
  * all_or_nothing:true removal

- Matt


>
> This is not to say that you won't be able to hide these things with a
> setting. The full discussion and design of these items has not yet
> taken place, so it's too early to say.
>
> B.
>
> On 18 April 2012 09:51, Paul Hirst <pa...@sophos.com> wrote:
>> I saw this idea on allourideas.org:
>>
>> "Conflicts as first class citizens: Surface the conflict on read, and always accept a write, assuming it passes validation."
>>
>> I was wondering if anyone could expand on this?
>>
>> On write, conflicts will be rejected at the moment which is really handy from a simplicity point of view and in many use cases it's a good enough solution. If you use the all_or_nothing:true option through the bulk API then you can currently write conflicting documents and this is (as I understand it) exactly what replication does.
>>
>> So, is this idea, about changing the default behaviour to act as the all_or_nothing option? Does it get rid of the ability to detect and reject conflicts at write time? Lastly, why does anyone want it when we seem to have the best of both worlds at the moment?
>>
>> ________________________________
>> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
>> Company Reg No 2096520. VAT Reg No GB 991 2418 08.

Re: Making conflicts first class citizens

Posted by Robert Newson <rn...@apache.org>.
Hi Paul,

I expanded on it here: https://gist.github.com/2387973

(As an aside, the all_or_nothing:true setting for bulk_docs is being
deprecated.)

The motivation for all these changes is that knowledge of conflicts,
and how to handle them, with the current API ends up happening very
late in the development cycle and, frequently, *post* development and
into production and maintenance. We think it better to make the
multi-master model, and its consequences, clearer and simpler to all
up front. It's a great model but you need to understand it, hiding
portions of it by default is a hindrance.

This is not to say that you won't be able to hide these things with a
setting. The full discussion and design of these items has not yet
taken place, so it's too early to say.

B.

On 18 April 2012 09:51, Paul Hirst <pa...@sophos.com> wrote:
> I saw this idea on allourideas.org:
>
> "Conflicts as first class citizens: Surface the conflict on read, and always accept a write, assuming it passes validation."
>
> I was wondering if anyone could expand on this?
>
> On write, conflicts will be rejected at the moment which is really handy from a simplicity point of view and in many use cases it's a good enough solution. If you use the all_or_nothing:true option through the bulk API then you can currently write conflicting documents and this is (as I understand it) exactly what replication does.
>
> So, is this idea, about changing the default behaviour to act as the all_or_nothing option? Does it get rid of the ability to detect and reject conflicts at write time? Lastly, why does anyone want it when we seem to have the best of both worlds at the moment?
>
> ________________________________
> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
> Company Reg No 2096520. VAT Reg No GB 991 2418 08.

Re: Making conflicts first class citizens

Posted by Matthieu Rakotojaona <ma...@gmail.com>.
If I understand things well, the `all_or_nothing:true` parameter does 2 things :

* ensure that all the updates pass the validation, push them one after
the other to the db, and if any of them fails, 'rollback' to before
the try
* store all the conflicts

I see 2 problems with that :

First, the parameter name is misleading : it makes a newcomer think
that all the documents will be stored, or nothing will be, which is
the 1st point. But storing the conflicts seems out of the scope; it
would be more logical to use something like "everything_or_nothing" to
care about documents AND conflicts, or "all_or_none" to care only
about documents. After all, this is called with the `_bulk_docs`
resource.

Second, the storing of all the conflicts by default only happen when
using `aIl_or_nothing`. In the CAP triangle, couchDB sacrifices
Consistency to provide Accessibility and Partition Tolerance. If you
sacrifice Consistency, you WILL generate conflicts; It is the rule,
not the exception, just like in the gist. At the moment, if one wants
to (properly) modify a doc, he has to get the current version of a
document before storing it (by specifying the correct _rev he wants to
update). But as the db might not be consistent, the version he gets
might not even be a correct one. Moreover, it requires 2 operations
for a write (althought I couldn't tell if this is an actual standard
in other db usage). So even if he thinks there is no conflict with the
server he is writing to, there might be conflicts with other nodes.

>From this point of view, I think that couchDB should store conflicts
on every modification, and tell the client if there are conflicts
after a write, for instance by adding a `has_conflicts:true` parameter
to the returned json if needed. When the application receives this,
the user can be immediately informed and a process can be started to
choose the right revision, but only if needed : other conflicts might
occur when he replicates from an other database.

-- 
Matthieu RAKOTOJAONA

Re: Making conflicts first class citizens

Posted by Paul Davis <pa...@gmail.com>.
On Wed, Apr 18, 2012 at 9:49 AM, Paul Hirst <pa...@sophos.com> wrote:
>> I wouldn't completely remove the ability to reject conflicts on write.
>> When I proposed this idea I was thinking first and foremost about
>> surfacing them on reads.  On writes I want a new option that looks a
>> lot like all_or_nothing:true but without the transactional bit where if
>> any document in the batch fails validation the whole batch is rejected.
>> Probably too early to say whether that new option would be the default.
>> I definitely agree with the other responses that planning for conflicts
>> from the start is a better approach to working with CouchDB.
>
> I disagree. While I think dealing with conflicts properly, often happens too late in development (and I'm guilty of this myself) I think being forced to deal with it from the start is unhelpful.
>
> I think it's fair and right to start out with 'strong consistency on one site' and then move onto 'full master/master replication across sites with eventual consistency and powerful conflict resolution'. For many use cases you don't need anything other than 'one site' so you can keep it simple. Surely that's more relaxing?
>

The issue here is that what most people don't completely understand
right away is that even a single site doesn't have strong consistency
guarantees once you get to replication. Regardless of multi-master or
whatever configuration, any use of replication can introduce conflicts
and what ends up happening is that people never realize this until its
a problem.

While we would be introducing a different default behavior, the old
version is a simple If-Match header away which would provide the same
behavior we currently have in a more standard HTTP model.

Also I should point out that until people have concurrent writers on a
single doc, this will behave much more like people have asked. Ie,
don't specify a revision and we'll automatically use the previous
value so that people don't have to GET the current doc just to
forcefully overwrite with a PUT.

> The 'strong consistency' message of Couchbase is actually rather compelling but when you introduce cross site replication they don't have 'powerful conflict resolution' plus they've thrown out most of CouchDB coolness. I fully understand the technical reasons for this but as a developer I'm left thinking +2 for strong consistency and crazy performance but -10 for loosing Couchapps, _changes, revisioning, REST, etc.
>
> Anyway I didn't write this to have a moan about Couchbase because I think it's pretty awesome product when you view it as something completely different to CouchDB. The point I'm really arguing is that strong consistency is simpler. In many cases CouchDBs current behaviour provides strong consistency out of the box, but if you need to move into multi master replication then CouchDB has the perfect tools to rescue you from what would otherwise be a world of pain.
>
> However, I agree that the developer should be able to choose the behaviour and be able to write conflicting documents without having to use confusing options in the bulk API to achieve it.
>
> Paul
>
>
> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
> Company Reg No 2096520. VAT Reg No GB 991 2418 08.

As an aside, all_or_nothing as discussed earlier is not correct. This
setting is ~never used from my experience and even those of us that
know about it have a hard time describing what its for. It was only
ever added for non-technical reasons related to a different feature
that had to be removed.

The new_edits:false option is the option that allows writes without
calculating a new revision though which may or may not be necessary
given the new behavior.

RE: Making conflicts first class citizens

Posted by Paul Hirst <pa...@sophos.com>.
> I wouldn't completely remove the ability to reject conflicts on write.
> When I proposed this idea I was thinking first and foremost about
> surfacing them on reads.  On writes I want a new option that looks a
> lot like all_or_nothing:true but without the transactional bit where if
> any document in the batch fails validation the whole batch is rejected.
> Probably too early to say whether that new option would be the default.
> I definitely agree with the other responses that planning for conflicts
> from the start is a better approach to working with CouchDB.

I disagree. While I think dealing with conflicts properly, often happens too late in development (and I'm guilty of this myself) I think being forced to deal with it from the start is unhelpful.

I think it's fair and right to start out with 'strong consistency on one site' and then move onto 'full master/master replication across sites with eventual consistency and powerful conflict resolution'. For many use cases you don't need anything other than 'one site' so you can keep it simple. Surely that's more relaxing?

The 'strong consistency' message of Couchbase is actually rather compelling but when you introduce cross site replication they don't have 'powerful conflict resolution' plus they've thrown out most of CouchDB coolness. I fully understand the technical reasons for this but as a developer I'm left thinking +2 for strong consistency and crazy performance but -10 for loosing Couchapps, _changes, revisioning, REST, etc.

Anyway I didn't write this to have a moan about Couchbase because I think it's pretty awesome product when you view it as something completely different to CouchDB. The point I'm really arguing is that strong consistency is simpler. In many cases CouchDBs current behaviour provides strong consistency out of the box, but if you need to move into multi master replication then CouchDB has the perfect tools to rescue you from what would otherwise be a world of pain.

However, I agree that the developer should be able to choose the behaviour and be able to write conflicting documents without having to use confusing options in the bulk API to achieve it.

Paul


Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 991 2418 08.

Re: Making conflicts first class citizens

Posted by Adam Kocoloski <ko...@apache.org>.
On Apr 18, 2012, at 4:51 AM, Paul Hirst wrote:

> I saw this idea on allourideas.org:
> 
> "Conflicts as first class citizens: Surface the conflict on read, and always accept a write, assuming it passes validation."
> 
> I was wondering if anyone could expand on this?
> 
> On write, conflicts will be rejected at the moment which is really handy from a simplicity point of view and in many use cases it's a good enough solution. If you use the all_or_nothing:true option through the bulk API then you can currently write conflicting documents and this is (as I understand it) exactly what replication does.
> 
> So, is this idea, about changing the default behaviour to act as the all_or_nothing option? Does it get rid of the ability to detect and reject conflicts at write time? Lastly, why does anyone want it when we seem to have the best of both worlds at the moment?
> 
> ________________________________
> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
> Company Reg No 2096520. VAT Reg No GB 991 2418 08.

I wouldn't completely remove the ability to reject conflicts on write.  When I proposed this idea I was thinking first and foremost about surfacing them on reads.  On writes I want a new option that looks a lot like all_or_nothing:true but without the transactional bit where if any document in the batch fails validation the whole batch is rejected.  Probably too early to say whether that new option would be the default.  I definitely agree with the other responses that planning for conflicts from the start is a better approach to working with CouchDB.

As an aside, replication does something related but slightly different -- in addition to creating a new edit branch if required, it uses new_edits:false to bypass the generation of a new _rev.

Adam