You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Hagen Overdick <si...@gmail.com> on 2009/04/01 14:23:10 UTC

Re: Bulk updates and eventual consistency

>
> IMO this is a questionable decision, but I'm in the minority.


 Guess, after much thought about this, I am joining the minority.

I base my argumentation on this excellent paper:
http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf

In essence, Pat Helland recommends to identify _entities_ which represent
the maximum scope of _local_ serializability. Thanks to the bulk update
mechanism, this used to be a whole couchdb, with the changes given, an
entity maps to a single document now.

The reason given here is sharding a single database, a concept which I would
refuse, because it breaks the idea of a database as an entity in the first
place. Btw, the reasoning that let to the removal of bulk_transactions can
be applied to the single update as well, there is just no guarantee there
won't be a conflicting update somewhere in the distributed environment.
Also, I don't really see, how you want to provide all_or_nothing semantics
assuming a sharded database.

So, what's an entity for CouchDB? I very much prefer a whole db, because I
can have partial updates (which is exactly what the old bulk_transaction
provided). I don't want to use this for referencial integrity, but local
serializability of updates. If you remove that, you will either force people
to bad design (keeping everything in a single document and eventually ask
for partial updates) or force them to replicate this functionality outside
of CouchDB, leading to ugly clutches.


Just my 2 Eurocents
Hagen
-- 
Dissertations are a successful walk through a minefield -- summarizing them
is not. - Roy Fielding

Re: Bulk updates and eventual consistency

Posted by Antony Blakey <an...@gmail.com>.
I am attempting to keep this: http://github.com/AntonyBlakey/couchdb/tree/transactional_bulk_docs 
  reasonably up-to-date with trunk. It provides transactional  
_bulk_docs if you add "fail_on_conflict": true to the top-level JSON  
body in the request. If fails with a 419 if there are any conflicts  
(and makes no db changes). The Conflict data is threaded back to the  
http response point with the intention of returning it as the response  
body, but I've not done that yet. Patch welcome.

The mod is designed to be easy to maintain wrt. trunk.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

He who would make his own liberty secure, must guard even his enemy  
from repression.
   -- Thomas Paine



Re: Bulk updates and eventual consistency

Posted by Chris Anderson <jc...@apache.org>.
On Wed, Apr 1, 2009 at 5:23 AM, Hagen Overdick <si...@gmail.com> wrote:
>>
>> IMO this is a questionable decision, but I'm in the minority.
>
>
>  Guess, after much thought about this, I am joining the minority.
>
> I base my argumentation on this excellent paper:
> http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf
>
> In essence, Pat Helland recommends to identify _entities_ which represent
> the maximum scope of _local_ serializability. Thanks to the bulk update
> mechanism, this used to be a whole couchdb, with the changes given, an
> entity maps to a single document now.

Correct. CouchDB is a key/value store. A database is just a namespace
for keys, and the boundary of map/reduce operations.

> So, what's an entity for CouchDB? I very much prefer a whole db

It was perhaps a mistake in managing expectations, to expose the
bulk-transactions API. My impression of the reason behind this is that
it made testing some low level file behavior more convenient in the
short term.

To provide an alternate viewpoint on this question, I remember using
CouchDB _before_ bulk-docs became transactional, and being
disappointed that what used to be an easy way to get data into CouchDB
was now failing even if just one of my documents had a conflict. In
the old days, bulk-docs worked a lot like it does in 0.9, and I found
this more relaxing for my web spidering use case.

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io