You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Alexander Uvarov <al...@gmail.com> on 2011/12/22 16:57:58 UTC

Is it possible to bring back optional old all-or-nothing behaviour?

With release 0.9 of CouchDB, bulk update semantics have been changed
so that a CouchDB server will not reject updates in case of conflicts.
IIRC old behavior was removed because it does not scale and this
commit dropped out much number of users who want to simplify their
development by dropping SQL and ORMs in some cases.
Current all-or-nothing behavior does not work in BigCouch by Cloudant,
I guess it does not shard. It does not shard, but still exists. So why
not take a chance to A_C_ID bulk docs?
Why not bring previous behavior as an option? This will much simplify
development of simple apps, apps for smartphones, apps with 1 database
per small number of client where sharding, clustering and replication
will never find a place, in other words "single master" apps.

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Randall Leeds <ra...@gmail.com>.

On Thu, Dec 22, 2011 at 20:46, Randall Leeds <ra...@gmail.com> wrote:
> More to the point though... I find replication is one of CouchDB's
> killer features and that's why some devs (like me and Paul) would
> rather see all_or_nothing vanish completely. If you need relational

Didn't mean to put words in Paul's mouth. It was Robert who said he'd
rather remove it.

> consistency but not replication you might be better served elsewhere.
> I won't tell you to go away (I love our users, and so I'm offering a
> lesser-known workaround with ?new_edits) but I won't mislead you about
> the goals of the project either.

On second reading I hope this didn't come across as harsh. I'm always
glad to help and offer alternatives. Furthermore, you're always
welcome to add such features yourself and perhaps there's community
interest in maintaining such a patch, but I would guess it's unlikely
to see upstream inclusion and probably fruitless to ask for the
committers to put time toward it.

-Randall

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Randall Leeds <ra...@gmail.com>.

On Thu, Dec 22, 2011 at 20:46, Randall Leeds <ra...@gmail.com> wrote:
> On Thu, Dec 22, 2011 at 20:18, Alexander Uvarov
> <al...@gmail.com> wrote:
>>
>> On Dec 23, 2011, at 1:49 AM, Paul Davis wrote:
>>
>>> On Thu, Dec 22, 2011 at 11:31 AM, Robert Newson <rn...@apache.org> wrote:
>>>> In my opinion, and I believe the majority opinion of the group, the
>>>> CouchDB API should be the same everywhere. This specifically includes
>>>> not doing things on a single box that will not work in a
>>>> clustered/sharded situation. It's why our transactions are scoped to a
>>>> single document, for example.
>>>>
>>>> I will also note that all_or_nothing does not provide multi-document
>>>> ACID transactions. The batches used in bulk_docs are not recorded, so
>>>> those items will be replicated individually (and in parallel, so not
>>>> even in a predictable order), which would break the C and I
>>>> characteristics on the receiving server. The old semantic would abort
>>>> the whole update if any one of the documents couldn't be updated but
>>>> the new semantic simply introduces a conflict in that case.
>>>>
>>>
>>> Slight nit pick, but new behavior just returns the error that the
>>> update would *cause* the conflict. (Assuming default non-replicator
>>> _bulk_docs calls.)
>>>
>>
>> Am I missing something? Current bulk_docs implementation will introduce a conflict in case of conflict, not just reject and return the error.
>>
>>>> B.
>>>>
>>>> On 22 December 2011 16:48, Alexander Uvarov <al...@gmail.com> wrote:
>>>>> And can become much easier with multi-document transactions as an option.
>>>>>
>>>>> On Thu, Dec 22, 2011 at 10:43 PM, Pepijn de Vos <pe...@yahoo.com> wrote:
>>>>>> But not everyone needs a cluster. I like CouchDB because it's easy, not because "it scales", and in some situations, all_or_nothing is easy.
>>>>>>
>>>
>>> Robert mentions it in passing, but the biggest reason that we dropped
>>> the original _bulk_docs behavior doesn't have anything to do with
>>> clustering. It was because the semantics are violated as soon as you
>>> try and replicate. Since there's no tracking of the group of docs
>>> posted to _bulk_docs then as soon as your mobile client tried to move
>>> data in or out you'd lose all three of ACI in ACID.
>>
>> Ain't every system with multi-master architecture will cause problems as soon as you try to replicate? Should this force people to design for replication even them don't need it? In my first message I mentioned that not every application need to be replicated. There are a thousands of such apps in the world. Even it's possible to design some app for replication, it can be very hard to do and developer and probably future users will spend a lot of time for superfluous.
>
> It's possible, but expensive, to have multi-master architecture and
> transaction isolation, but it involves distributed commit protocols.
>
> The wiki documentation is maybe slightly misleading in that the
> guarantees provided by the current Apache CouchDB around
> all_or_nothing have nothing to do with database crashes. All
> _bulk_docs requests are written as a single group commit with a single
> database header write, so either all valid, non-conflicting writes are
> durably stored or none are. all_or_nothing lets validation functions
> reject the whole bulk rather than just the failing write, and then
> during the commit phase create conflicts rather than returning an
> error.
>
> Here's the key: if your documents are known to be valid (or you don't
> have a validate_doc_update function in your database), then the
> difference is only whether or not conflicts are created or rejected,
> not whether all writes hit disk durably or not, as the wiki might seem
> to suggest.
>
> The replicator uses a flag on the query parameter to create conflicts
> rather than rejecting them: ?new_edits=false. If you can tolerate
> conflicts please feel free to create your own revision ids (bump the
> leading number, create a random id, and slap them together with a
> dash) and use ?new_edits=false. You'll get the same semantics with
> respect to conflicts as all_or_nothing. You lose little by generating
> your own revision ids since deterministic revisions is an optimization
> for replication. Maybe that lets you move forward with your use case.
>
> More to the point though... I find replication is one of CouchDB's
> killer features and that's why some devs (like me and Paul) would
> rather see all_or_nothing vanish completely. If you need relational
> consistency but not replication you might be better served elsewhere.
> I won't tell you to go away (I love our users, and so I'm offering a
> lesser-known workaround with ?new_edits) but I won't mislead you about
> the goals of the project either.
>
> -Randall

I didn't realize when I wrote this that new_edits is actually
documented [1]. I hope that helps!

Cheers,
Randall

[1] https://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Posting_Existing_Revisions

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Randall Leeds <ra...@gmail.com>.

On Thu, Dec 22, 2011 at 20:18, Alexander Uvarov
<al...@gmail.com> wrote:
>
> On Dec 23, 2011, at 1:49 AM, Paul Davis wrote:
>
>> On Thu, Dec 22, 2011 at 11:31 AM, Robert Newson <rn...@apache.org> wrote:
>>> In my opinion, and I believe the majority opinion of the group, the
>>> CouchDB API should be the same everywhere. This specifically includes
>>> not doing things on a single box that will not work in a
>>> clustered/sharded situation. It's why our transactions are scoped to a
>>> single document, for example.
>>>
>>> I will also note that all_or_nothing does not provide multi-document
>>> ACID transactions. The batches used in bulk_docs are not recorded, so
>>> those items will be replicated individually (and in parallel, so not
>>> even in a predictable order), which would break the C and I
>>> characteristics on the receiving server. The old semantic would abort
>>> the whole update if any one of the documents couldn't be updated but
>>> the new semantic simply introduces a conflict in that case.
>>>
>>
>> Slight nit pick, but new behavior just returns the error that the
>> update would *cause* the conflict. (Assuming default non-replicator
>> _bulk_docs calls.)
>>
>
> Am I missing something? Current bulk_docs implementation will introduce a conflict in case of conflict, not just reject and return the error.
>
>>> B.
>>>
>>> On 22 December 2011 16:48, Alexander Uvarov <al...@gmail.com> wrote:
>>>> And can become much easier with multi-document transactions as an option.
>>>>
>>>> On Thu, Dec 22, 2011 at 10:43 PM, Pepijn de Vos <pe...@yahoo.com> wrote:
>>>>> But not everyone needs a cluster. I like CouchDB because it's easy, not because "it scales", and in some situations, all_or_nothing is easy.
>>>>>
>>
>> Robert mentions it in passing, but the biggest reason that we dropped
>> the original _bulk_docs behavior doesn't have anything to do with
>> clustering. It was because the semantics are violated as soon as you
>> try and replicate. Since there's no tracking of the group of docs
>> posted to _bulk_docs then as soon as your mobile client tried to move
>> data in or out you'd lose all three of ACI in ACID.
>
> Ain't every system with multi-master architecture will cause problems as soon as you try to replicate? Should this force people to design for replication even them don't need it? In my first message I mentioned that not every application need to be replicated. There are a thousands of such apps in the world. Even it's possible to design some app for replication, it can be very hard to do and developer and probably future users will spend a lot of time for superfluous.

It's possible, but expensive, to have multi-master architecture and
transaction isolation, but it involves distributed commit protocols.

The wiki documentation is maybe slightly misleading in that the
guarantees provided by the current Apache CouchDB around
all_or_nothing have nothing to do with database crashes. All
_bulk_docs requests are written as a single group commit with a single
database header write, so either all valid, non-conflicting writes are
durably stored or none are. all_or_nothing lets validation functions
reject the whole bulk rather than just the failing write, and then
during the commit phase create conflicts rather than returning an
error.

Here's the key: if your documents are known to be valid (or you don't
have a validate_doc_update function in your database), then the
difference is only whether or not conflicts are created or rejected,
not whether all writes hit disk durably or not, as the wiki might seem
to suggest.

The replicator uses a flag on the query parameter to create conflicts
rather than rejecting them: ?new_edits=false. If you can tolerate
conflicts please feel free to create your own revision ids (bump the
leading number, create a random id, and slap them together with a
dash) and use ?new_edits=false. You'll get the same semantics with
respect to conflicts as all_or_nothing. You lose little by generating
your own revision ids since deterministic revisions is an optimization
for replication. Maybe that lets you move forward with your use case.

More to the point though... I find replication is one of CouchDB's
killer features and that's why some devs (like me and Paul) would
rather see all_or_nothing vanish completely. If you need relational
consistency but not replication you might be better served elsewhere.
I won't tell you to go away (I love our users, and so I'm offering a
lesser-known workaround with ?new_edits) but I won't mislead you about
the goals of the project either.

-Randall

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Alexander Uvarov <al...@gmail.com>.

On Dec 23, 2011, at 1:49 AM, Paul Davis wrote:

> On Thu, Dec 22, 2011 at 11:31 AM, Robert Newson <rn...@apache.org> wrote:
>> In my opinion, and I believe the majority opinion of the group, the
>> CouchDB API should be the same everywhere. This specifically includes
>> not doing things on a single box that will not work in a
>> clustered/sharded situation. It's why our transactions are scoped to a
>> single document, for example.
>> 
>> I will also note that all_or_nothing does not provide multi-document
>> ACID transactions. The batches used in bulk_docs are not recorded, so
>> those items will be replicated individually (and in parallel, so not
>> even in a predictable order), which would break the C and I
>> characteristics on the receiving server. The old semantic would abort
>> the whole update if any one of the documents couldn't be updated but
>> the new semantic simply introduces a conflict in that case.
>> 
> 
> Slight nit pick, but new behavior just returns the error that the
> update would *cause* the conflict. (Assuming default non-replicator
> _bulk_docs calls.)
> 

Am I missing something? Current bulk_docs implementation will introduce a conflict in case of conflict, not just reject and return the error.

>> B.
>> 
>> On 22 December 2011 16:48, Alexander Uvarov <al...@gmail.com> wrote:
>>> And can become much easier with multi-document transactions as an option.
>>> 
>>> On Thu, Dec 22, 2011 at 10:43 PM, Pepijn de Vos <pe...@yahoo.com> wrote:
>>>> But not everyone needs a cluster. I like CouchDB because it's easy, not because "it scales", and in some situations, all_or_nothing is easy.
>>>> 
> 
> Robert mentions it in passing, but the biggest reason that we dropped
> the original _bulk_docs behavior doesn't have anything to do with
> clustering. It was because the semantics are violated as soon as you
> try and replicate. Since there's no tracking of the group of docs
> posted to _bulk_docs then as soon as your mobile client tried to move
> data in or out you'd lose all three of ACI in ACID.

Ain't every system with multi-master architecture will cause problems as soon as you try to replicate? Should this force people to design for replication even them don't need it? In my first message I mentioned that not every application need to be replicated. There are a thousands of such apps in the world. Even it's possible to design some app for replication, it can be very hard to do and developer and probably future users will spend a lot of time for superfluous.

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Paul Davis <pa...@gmail.com>.

On Thu, Dec 22, 2011 at 11:31 AM, Robert Newson <rn...@apache.org> wrote:
> In my opinion, and I believe the majority opinion of the group, the
> CouchDB API should be the same everywhere. This specifically includes
> not doing things on a single box that will not work in a
> clustered/sharded situation. It's why our transactions are scoped to a
> single document, for example.
>
> I will also note that all_or_nothing does not provide multi-document
> ACID transactions. The batches used in bulk_docs are not recorded, so
> those items will be replicated individually (and in parallel, so not
> even in a predictable order), which would break the C and I
> characteristics on the receiving server. The old semantic would abort
> the whole update if any one of the documents couldn't be updated but
> the new semantic simply introduces a conflict in that case.
>

Slight nit pick, but new behavior just returns the error that the
update would *cause* the conflict. (Assuming default non-replicator
_bulk_docs calls.)

> B.
>
> On 22 December 2011 16:48, Alexander Uvarov <al...@gmail.com> wrote:
>> And can become much easier with multi-document transactions as an option.
>>
>> On Thu, Dec 22, 2011 at 10:43 PM, Pepijn de Vos <pe...@yahoo.com> wrote:
>>> But not everyone needs a cluster. I like CouchDB because it's easy, not because "it scales", and in some situations, all_or_nothing is easy.
>>>

Robert mentions it in passing, but the biggest reason that we dropped
the original _bulk_docs behavior doesn't have anything to do with
clustering. It was because the semantics are violated as soon as you
try and replicate. Since there's no tracking of the group of docs
posted to _bulk_docs then as soon as your mobile client tried to move
data in or out you'd lose all three of ACI in ACID.

The follow up question that I had spent some time on was trying to
think of a way to *add* these bulk group indicators to solve this in
the replicator. As it turns out, the way our update_seq indices work
is fairly at odds with this grouping. When documents are updated, they
are moved to the new update_seq position. Without some major
reengineering of the core of couchdb (that directly relates to
replication) there isn't much that we can do here.

Generally speaking, my rule of thumb is that if you find yourself
wanting this feature then you're probably going to want to rethink
your application's architecture. When we went through this discussion
back in the 0.9 days I found myself spending a lot of time trying to
think of new designs that would save it. Then I slowly realized that I
just hadn't completely groked the replication/distribution model in
CouchDB.

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Robert Newson <rn...@apache.org>.

In my opinion, and I believe the majority opinion of the group, the
CouchDB API should be the same everywhere. This specifically includes
not doing things on a single box that will not work in a
clustered/sharded situation. It's why our transactions are scoped to a
single document, for example.

I will also note that all_or_nothing does not provide multi-document
ACID transactions. The batches used in bulk_docs are not recorded, so
those items will be replicated individually (and in parallel, so not
even in a predictable order), which would break the C and I
characteristics on the receiving server. The old semantic would abort
the whole update if any one of the documents couldn't be updated but
the new semantic simply introduces a conflict in that case.

B.

On 22 December 2011 16:48, Alexander Uvarov <al...@gmail.com> wrote:
> And can become much easier with multi-document transactions as an option.
>
> On Thu, Dec 22, 2011 at 10:43 PM, Pepijn de Vos <pe...@yahoo.com> wrote:
>> But not everyone needs a cluster. I like CouchDB because it's easy, not because "it scales", and in some situations, all_or_nothing is easy.
>>

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Alexander Uvarov <al...@gmail.com>.

And can become much easier with multi-document transactions as an option.

On Thu, Dec 22, 2011 at 10:43 PM, Pepijn de Vos <pe...@yahoo.com> wrote:
> But not everyone needs a cluster. I like CouchDB because it's easy, not because "it scales", and in some situations, all_or_nothing is easy.
>

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Pepijn de Vos <pe...@yahoo.com>.

But not everyone needs a cluster. I like CouchDB because it's easy, not because "it scales", and in some situations, all_or_nothing is easy.

Pepijn

On Dec 22, 2011, at 5:36 PM, Robert Newson wrote:

> The all_or_nothing option (both old and new semantic) are,
> respectively, impossible or very hard, to achieve in a cluster.
> 
> I would like to see all_or_nothing removed entirely in CouchDB 2.0.
> 
> B.
> 
> On 22 December 2011 15:57, Alexander Uvarov <al...@gmail.com> wrote:
>> With release 0.9 of CouchDB, bulk update semantics have been changed
>> so that a CouchDB server will not reject updates in case of conflicts.
>> IIRC old behavior was removed because it does not scale and this
>> commit dropped out much number of users who want to simplify their
>> development by dropping SQL and ORMs in some cases.
>> Current all-or-nothing behavior does not work in BigCouch by Cloudant,
>> I guess it does not shard. It does not shard, but still exists. So why
>> not take a chance to A_C_ID bulk docs?
>> Why not bring previous behavior as an option? This will much simplify
>> development of simple apps, apps for smartphones, apps with 1 database
>> per small number of client where sharding, clustering and replication
>> will never find a place, in other words "single master" apps.

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Alexander Uvarov <al...@gmail.com>.

Shouldn't users decide whether they want to run in cluster or not? I
think an alert in docs about all_or_nothing and cluster is good
enough. Why you strictly for feature removal?

On Thu, Dec 22, 2011 at 10:36 PM, Robert Newson <rn...@apache.org> wrote:
> The all_or_nothing option (both old and new semantic) are,
> respectively, impossible or very hard, to achieve in a cluster.
>
> I would like to see all_or_nothing removed entirely in CouchDB 2.0.
>
> B.
>

Re: Is it possible to bring back optional old all-or-nothing behaviour?

Posted by Robert Newson <rn...@apache.org>.

The all_or_nothing option (both old and new semantic) are,
respectively, impossible or very hard, to achieve in a cluster.

I would like to see all_or_nothing removed entirely in CouchDB 2.0.

B.

On 22 December 2011 15:57, Alexander Uvarov <al...@gmail.com> wrote:
> With release 0.9 of CouchDB, bulk update semantics have been changed
> so that a CouchDB server will not reject updates in case of conflicts.
> IIRC old behavior was removed because it does not scale and this
> commit dropped out much number of users who want to simplify their
> development by dropping SQL and ORMs in some cases.
> Current all-or-nothing behavior does not work in BigCouch by Cloudant,
> I guess it does not shard. It does not shard, but still exists. So why
> not take a chance to A_C_ID bulk docs?
> Why not bring previous behavior as an option? This will much simplify
> development of simple apps, apps for smartphones, apps with 1 database
> per small number of client where sharding, clustering and replication
> will never find a place, in other words "single master" apps.