You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Brian Candler <B....@pobox.com> on 2009/11/01 23:07:29 UTC

all_or_nothing=true and replication

At http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API I see the
following:

"Bulk updates work independently of replication, meaning document revisions
originally saved as part of an all or nothing transaction will be replicated
individually, not as part of a bulk transaction. This means other replica
instances may only have a subset of the transaction, and if an update is
rejected by the remote node during replication (e.g. not authorized error)
the remote node may never have the complete transaction."

I had a vague idea from the original discussions that these transactions
would remain together, but this appears to be wrong.

I'd just like to ask if the lack of boundaries for replication is
intentional behaviour, or just an artefact of the current implementation
which might change?

I can think of circumstances where it might be useful to keep them together.
Consider, for example, an all_or_nothing transaction which is used to
resolve a conflict between three documents: it does this by writing a new
revision, and deleting the other two revisions.

If this set of updates became split upon replication, it might end up
deleting the two old revisions but not updating the other document; thus you
would have lost data from those two revisions.

Any thoughts or comments?

Regards,

Brian.

Re: all_or_nothing=true and replication

Posted by Brian Candler <B....@pobox.com>.

On Tue, Nov 03, 2009 at 09:44:44AM -0500, Adam Kocoloski wrote:
>> On a related point: is it possible to configure a database to stop  
>> people
>> *pulling* certain documents? For example, if I want to allow people to 
>> read
>> and replicate user documents but not _design documents?
>
> Not at the moment.  We've had some proposals for document-granularity  
> ACLs.  The sticking point often ends up being the view indexing -- e.g. 
> what privileges does it have, and how do we keep it from exposing data 
> that would otherwise be restricted from a user?

Here's a simple suggestion.

1. just let views work as normal; except
2. prevent include_docs=true working for docs which the user would not be
   able to retrieve otherwise

For people who don't care that things may appear in the index which the user
can't subsequently retrieve, that's fine.

For people who do care, perhaps they can block direct access to the view and
force the user to go via a _list function which filters it.

For me, I only care that _design docs aren't visible, and those don't end up
in views anyway. Unfortunately, just blocking URLs with _design isn't
sufficient protection, since there are other ways of getting them, e.g. via
_all_docs

The only solution I can think of for now is to do a partial replication to
another database, and let users pull from that one.

Regards,

Brian.

Re: all_or_nothing=true and replication

Posted by Adam Kocoloski <ko...@apache.org>.

On Nov 3, 2009, at 4:43 AM, Brian Candler wrote:

> On Sun, Nov 01, 2009 at 05:36:18PM -0500, Damien Katz wrote:
>> Yes, you can make a situation where somehow a user has legal  
>> updates to
>> certain conflicts, but not others, on a particular node A. On some  
>> other
>> node B, somehow the security was different and he was allowed to  
>> update
>> all the docs. Then an attempt to merge all the conflicts into the
>> document the user didn't really have edit access too will not be
>> replicated from node B to node A.
>>
>> It's a contrived situation, but possible with misconfigured or  
>> updated
>> security settings that haven't propagated.
>
> This is definitely something that I need to add to the
> Replication_and_conflicts page.
>
> When you talk about security mechanisms, I know of  
> "validate_doc_update",
> but are there other things which can affect whether a document is  
> replicated
> or not? (e.g. I've heard talk of a filtered _changes feed, I don't  
> know if
> that's implemented yet). I'd like to make sure I cover all bases.

Filtered _changes feeds have been implemented, although Benoit noted a  
problem with the continuous version.  The replicator doesn't yet know  
how to consume them, but it will soon.

> On a related point: is it possible to configure a database to stop  
> people
> *pulling* certain documents? For example, if I want to allow people  
> to read
> and replicate user documents but not _design documents?

Not at the moment.  We've had some proposals for document-granularity  
ACLs.  The sticking point often ends up being the view indexing --  
e.g. what privileges does it have, and how do we keep it from exposing  
data that would otherwise be restricted from a user? Best,

Adam

Re: all_or_nothing=true and replication

Posted by Brian Candler <B....@pobox.com>.

On Sun, Nov 01, 2009 at 05:36:18PM -0500, Damien Katz wrote:
> Yes, you can make a situation where somehow a user has legal updates to 
> certain conflicts, but not others, on a particular node A. On some other 
> node B, somehow the security was different and he was allowed to update 
> all the docs. Then an attempt to merge all the conflicts into the 
> document the user didn't really have edit access too will not be  
> replicated from node B to node A.
>
> It's a contrived situation, but possible with misconfigured or updated  
> security settings that haven't propagated.

This is definitely something that I need to add to the
Replication_and_conflicts page.

When you talk about security mechanisms, I know of "validate_doc_update",
but are there other things which can affect whether a document is replicated
or not? (e.g. I've heard talk of a filtered _changes feed, I don't know if
that's implemented yet). I'd like to make sure I cover all bases.

On a related point: is it possible to configure a database to stop people
*pulling* certain documents? For example, if I want to allow people to read
and replicate user documents but not _design documents?

Thanks,

Brian.

Re: all_or_nothing=true and replication

Posted by Brian Candler <B....@pobox.com>.

On Sun, Nov 01, 2009 at 05:36:18PM -0500, Damien Katz wrote:
> It's intentional behavior. Documents are meant to be independent, even  
> conflicts..
...
> However, if you doing this as a VCS where you keep all the diffs, then  
> simply put the edit history as diffs into even the deleted documents.  
> Deleted docs actually can have a body and attachments, for just this  
> reason. Then no data is lost even when weird security settings disallow 
> only certain replicated edits.

OK, I can see that there are workarounds for specific cases. But then the
same workarounds could be used on a single-node system too, without using
all_or_nothing.

What I mean is: what exactly is the value of bundling a series of updates
together with all_or_nothing, if they become unbundled again upon
replication? If your application relies on all_or_nothing grouping to work
properly, wouldn't you just end up making it unable to work in a
multi-master environment?

Let me propose another example. Say you want to insert an entity A and two
child entities B1 and B2 - and you have a good reason for not wanting to
incorporate them into the same document. You don't want B1 or B2 to be
orphaned in the database without their parent A, so you insert them together
with all_or_nothing.

After replication, in the current model you *could* end up with orphans,
either because of update policy on the second node, or because of failures
occuring at just the wrong time.

Would it not be reasonable for them to replicate as a unit, and all fail to
replicate if any one is rejected? This would give you the same semantics
across the replication boundary.

I realise that you can get these semantics by replicating a single document
{A,[B1,B2]}. But if you were able to write your app that way, you wouldn't
have needed all_or_nothing in the first place, would you?

Regards,

Brian.

Re: all_or_nothing=true and replication

Posted by Damien Katz <da...@apache.org>.

On Nov 1, 2009, at 5:07 PM, Brian Candler wrote:

> At http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API I see the
> following:
>
> "Bulk updates work independently of replication, meaning document  
> revisions
> originally saved as part of an all or nothing transaction will be  
> replicated
> individually, not as part of a bulk transaction. This means other  
> replica
> instances may only have a subset of the transaction, and if an  
> update is
> rejected by the remote node during replication (e.g. not authorized  
> error)
> the remote node may never have the complete transaction."
>
> I had a vague idea from the original discussions that these  
> transactions
> would remain together, but this appears to be wrong.
>
> I'd just like to ask if the lack of boundaries for replication is
> intentional behaviour, or just an artefact of the current  
> implementation
> which might change?

It's intentional behavior. Documents are meant to be independent, even  
conflicts..

>
> I can think of circumstances where it might be useful to keep them  
> together.
> Consider, for example, an all_or_nothing transaction which is used to
> resolve a conflict between three documents: it does this by writing  
> a new
> revision, and deleting the other two revisions.
>
> If this set of updates became split upon replication, it might end up
> deleting the two old revisions but not updating the other document;  
> thus you
> would have lost data from those two revisions.
> Any thoughts or comments?

Yes, you can make a situation where somehow a user has legal updates  
to certain conflicts, but not others, on a particular node A. On some  
other node B, somehow the security was different and he was allowed to  
update all the docs. Then an attempt to merge all the conflicts into  
the document the user didn't really have edit access too will not be  
replicated from node B to node A.

It's a contrived situation, but possible with misconfigured or updated  
security settings that haven't propagated.

However, if you doing this as a VCS where you keep all the diffs, then  
simply put the edit history as diffs into even the deleted documents.  
Deleted docs actually can have a body and attachments, for just this  
reason. Then no data is lost even when weird security settings  
disallow only certain replicated edits.

-Damien

>
> Regards,
>
> Brian.