You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Alexander Bolodurin <al...@gmail.com> on 2012/10/25 02:29:41 UTC

Resolving replication conflicts for deleted documents in CouchDB

Hi,

(I have asked this at StackOverflow, but, unsurprisingly, the question didn't get much attention.)

I'm designing replication conflict handling for a system, and one of its assumptions is that deletion always takes precedence when resolving conflicts: a deleted documents stays deleted regardless of what edits it conflicts with, IDs are not reused.

The "official" way of resolving replication conflicts (read conflicting revisions, merge in the application code, delete unwanted revisions) is not applicable to deleted documents. If a document is edited on instance 1, and deleted on instance 2, after replication both instances get the revision from 1. Because only one leaf revision is alive, the document ends up "undeleted", and without conflicts. The other revision ends up in _deleted_conflicts field, instead of _conflicts, but I can't use _deleted_conflicts as a cue that a document was deleted, because it includes deleted revisions from resolving edit conflicts and documents that were deleted and then re-added, so it's too general and conflates several cases.

How can I get around this at the CouchDB level? Moving it up the application layer gets really hairy really quickly as now I have to have my custom "deleted" flag, rewrite my views, test more code and have extra batch jobs to clean up records marked for delete.

Regards,
Alex.

Re: Resolving replication conflicts for deleted documents in CouchDB

Posted by Alexander Bolodurin <al...@gmail.com>.
Thanks,

This is what I suspected, looks like we have to roll our own "deleted" state if we want to handle this case.

I don't think think the fact that a deleted document may contain arbitrary attributes help, because then I'd have to examine _deleted_conflicts list or open_revs just to check if it was deleted. This means I'll always have to poll any documents that happened to have any conflicts at all every single time, because _deleted_conflicts will be forever non-empty (and unbounded) and there is no way to tell which ones are deleted not due to conflict resolution without reading them.

On 26/10/2012, at 1:29 AM, Robert Newson wrote:

> Hi,
> 
> Thanks for clarifying. I don't think you can achieve your desired
> result at a lower level than your proposal to use your own deleted
> flag (and account for that in views, etc). Does it help at all that a
> deleted document can contain any set of properties you like? The
> DELETE method translates internally to a PUT {_id:id, _rev:new_rev,
> _deleted:true}. You can delete a document by adding _deleted:true and
> keep any properties you like in there.
> 
> Btw, I stopped populating StackOverflow with answers when they started
> abusing their contact database.
> 
> B.
> 
> On 25 October 2012 14:47, Alexander Bolodurin
> <al...@gmail.com> wrote:
>> Thanks Robert,
>> 
>> I understand the mechanics, but it doesn't quite solve my problem yet.
>> 
>> In your example it's clear: one replica edits foo, another one deletes foo, so both will see a live and a _deleted revisions.
>> But it's not the only case. If I happened to resolve a regular edit conflict and delete one revision, the result is identical (as it should be).
>> Except in the second case I shouldn't delete the live revision, because it has been introduced as a result of conflict resolution, the user hasn't deleted anything.
>> 
>> As far as I can tell, there is no way to tell the "origin" of a deleted revision, at least this way.
>> 
>> Example: https://gist.github.com/3952603
>> 
>> On 25/10/2012, at 11:17 PM, Robert Newson wrote:
>> 
>>> A deletion is just an update. The algorithm that CouchDB uses to
>>> choose one leaf out of many deliberately chooses _deleted:false over
>>> _deleted:true.
>>> 
>>> Here's a test run I just performed on couchdb/master;
>>> 
>>> # setup instance #1
>>> curl localhost:5984/alex -XPUT
>>> {"ok":true}
>>> 
>>> curl localhost:5984/alex/foo -XPUT -d{}
>>> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
>>> 
>>> # setup identical instance #2
>>> curl localhost:5984/alex2 -XPUT
>>> {"ok":true}
>>> 
>>> curl localhost:5984/alex2/foo -XPUT -d{}
>>> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
>>> 
>>> # update doc in instance #1
>>> curl localhost:5984/alex2/foo -XPUT -d
>>> '{"_rev:"1-967a00dff5e02add41819138abb3284d"}'
>>> 
>>> # delete doc in instance #2
>>> curl localhost:5984/alex2/foo?rev=1-967a00dff5e02add41819138abb3284d  -XDELETE
>>> 
>>> curl localhost:5984/_replicate -Hcontent-type:application/json -d
>>> '{"source":"alex2","target":"alex"}'
>>> {"ok":true,"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","source_last_seq":2,"replication_id_version":3,"history":[{"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","start_time":"Thu,
>>> 25 Oct 2012 12:10:54 GMT","end_time":"Thu, 25 Oct 2012 12:10:54
>>> GMT","start_last_seq":0,"end_last_seq":2,"recorded_seq":2,"missing_checked":1,"missing_found":1,"docs_read":1,"docs_written":1,"doc_write_failures":0}]}
>>> 
>>> curl localhost:5984/alex/foo
>>> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
>>> 
>>> curl 'localhost:5984/alex/foo?open_revs=all'
>>> --2b1fcadf47010c46a3afa22b7533dd07
>>> Content-Type: application/json
>>> 
>>> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
>>> --2b1fcadf47010c46a3afa22b7533dd07
>>> Content-Type: application/json
>>> 
>>> {"_id":"foo","_rev":"2-eec205a9d413992850a6e32678485900","_deleted":true}
>>> --2b1fcadf47010c46a3afa22b7533dd07--%
>>> 
>>> As you can see, the first database, alex, will show the non-deleted
>>> doc as per our algorithm, but the doc has two leaf revisions now. To
>>> resolve in the direction you want, delete the
>>> 2-7051cbe5c8faecd085a3fa619e6e6337 revision;
>>> 
>>> curl localhost:5984/alex/foo?rev=2-7051cbe5c8faecd085a3fa619e6e6337 -XDELETE
>>> {"ok":true,"id":"foo","rev":"3-7379b9e515b161226c6559d90c4dc49f"}
>>> 
>>> curl 'localhost:5984/alex/foo'
>>> {"error":"not_found","reason":"deleted"}
>>> 
>>> B.
>>> 
>>> On 25 October 2012 01:29, Alexander Bolodurin
>>> <al...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> (I have asked this at StackOverflow, but, unsurprisingly, the question didn't get much attention.)
>>>> 
>>>> I'm designing replication conflict handling for a system, and one of its assumptions is that deletion always takes precedence when resolving conflicts: a deleted documents stays deleted regardless of what edits it conflicts with, IDs are not reused.
>>>> 
>>>> The "official" way of resolving replication conflicts (read conflicting revisions, merge in the application code, delete unwanted revisions) is not applicable to deleted documents. If a document is edited on instance 1, and deleted on instance 2, after replication both instances get the revision from 1. Because only one leaf revision is alive, the document ends up "undeleted", and without conflicts. The other revision ends up in _deleted_conflicts field, instead of _conflicts, but I can't use _deleted_conflicts as a cue that a document was deleted, because it includes deleted revisions from resolving edit conflicts and documents that were deleted and then re-added, so it's too general and conflates several cases.
>>>> 
>>>> How can I get around this at the CouchDB level? Moving it up the application layer gets really hairy really quickly as now I have to have my custom "deleted" flag, rewrite my views, test more code and have extra batch jobs to clean up records marked for delete.
>>>> 
>>>> Regards,
>>>> Alex.
>>> 
>> 
> 


Re: Resolving replication conflicts for deleted documents in CouchDB

Posted by Robert Newson <rn...@apache.org>.
Hi,

Thanks for clarifying. I don't think you can achieve your desired
result at a lower level than your proposal to use your own deleted
flag (and account for that in views, etc). Does it help at all that a
deleted document can contain any set of properties you like? The
DELETE method translates internally to a PUT {_id:id, _rev:new_rev,
_deleted:true}. You can delete a document by adding _deleted:true and
keep any properties you like in there.

Btw, I stopped populating StackOverflow with answers when they started
abusing their contact database.

B.

On 25 October 2012 14:47, Alexander Bolodurin
<al...@gmail.com> wrote:
> Thanks Robert,
>
> I understand the mechanics, but it doesn't quite solve my problem yet.
>
> In your example it's clear: one replica edits foo, another one deletes foo, so both will see a live and a _deleted revisions.
> But it's not the only case. If I happened to resolve a regular edit conflict and delete one revision, the result is identical (as it should be).
> Except in the second case I shouldn't delete the live revision, because it has been introduced as a result of conflict resolution, the user hasn't deleted anything.
>
> As far as I can tell, there is no way to tell the "origin" of a deleted revision, at least this way.
>
> Example: https://gist.github.com/3952603
>
> On 25/10/2012, at 11:17 PM, Robert Newson wrote:
>
>> A deletion is just an update. The algorithm that CouchDB uses to
>> choose one leaf out of many deliberately chooses _deleted:false over
>> _deleted:true.
>>
>> Here's a test run I just performed on couchdb/master;
>>
>> # setup instance #1
>> curl localhost:5984/alex -XPUT
>> {"ok":true}
>>
>> curl localhost:5984/alex/foo -XPUT -d{}
>> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
>>
>> # setup identical instance #2
>> curl localhost:5984/alex2 -XPUT
>> {"ok":true}
>>
>> curl localhost:5984/alex2/foo -XPUT -d{}
>> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
>>
>> # update doc in instance #1
>> curl localhost:5984/alex2/foo -XPUT -d
>> '{"_rev:"1-967a00dff5e02add41819138abb3284d"}'
>>
>> # delete doc in instance #2
>> curl localhost:5984/alex2/foo?rev=1-967a00dff5e02add41819138abb3284d  -XDELETE
>>
>> curl localhost:5984/_replicate -Hcontent-type:application/json -d
>> '{"source":"alex2","target":"alex"}'
>> {"ok":true,"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","source_last_seq":2,"replication_id_version":3,"history":[{"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","start_time":"Thu,
>> 25 Oct 2012 12:10:54 GMT","end_time":"Thu, 25 Oct 2012 12:10:54
>> GMT","start_last_seq":0,"end_last_seq":2,"recorded_seq":2,"missing_checked":1,"missing_found":1,"docs_read":1,"docs_written":1,"doc_write_failures":0}]}
>>
>> curl localhost:5984/alex/foo
>> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
>>
>> curl 'localhost:5984/alex/foo?open_revs=all'
>> --2b1fcadf47010c46a3afa22b7533dd07
>> Content-Type: application/json
>>
>> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
>> --2b1fcadf47010c46a3afa22b7533dd07
>> Content-Type: application/json
>>
>> {"_id":"foo","_rev":"2-eec205a9d413992850a6e32678485900","_deleted":true}
>> --2b1fcadf47010c46a3afa22b7533dd07--%
>>
>> As you can see, the first database, alex, will show the non-deleted
>> doc as per our algorithm, but the doc has two leaf revisions now. To
>> resolve in the direction you want, delete the
>> 2-7051cbe5c8faecd085a3fa619e6e6337 revision;
>>
>> curl localhost:5984/alex/foo?rev=2-7051cbe5c8faecd085a3fa619e6e6337 -XDELETE
>> {"ok":true,"id":"foo","rev":"3-7379b9e515b161226c6559d90c4dc49f"}
>>
>> curl 'localhost:5984/alex/foo'
>> {"error":"not_found","reason":"deleted"}
>>
>> B.
>>
>> On 25 October 2012 01:29, Alexander Bolodurin
>> <al...@gmail.com> wrote:
>>> Hi,
>>>
>>> (I have asked this at StackOverflow, but, unsurprisingly, the question didn't get much attention.)
>>>
>>> I'm designing replication conflict handling for a system, and one of its assumptions is that deletion always takes precedence when resolving conflicts: a deleted documents stays deleted regardless of what edits it conflicts with, IDs are not reused.
>>>
>>> The "official" way of resolving replication conflicts (read conflicting revisions, merge in the application code, delete unwanted revisions) is not applicable to deleted documents. If a document is edited on instance 1, and deleted on instance 2, after replication both instances get the revision from 1. Because only one leaf revision is alive, the document ends up "undeleted", and without conflicts. The other revision ends up in _deleted_conflicts field, instead of _conflicts, but I can't use _deleted_conflicts as a cue that a document was deleted, because it includes deleted revisions from resolving edit conflicts and documents that were deleted and then re-added, so it's too general and conflates several cases.
>>>
>>> How can I get around this at the CouchDB level? Moving it up the application layer gets really hairy really quickly as now I have to have my custom "deleted" flag, rewrite my views, test more code and have extra batch jobs to clean up records marked for delete.
>>>
>>> Regards,
>>> Alex.
>>
>

Re: Resolving replication conflicts for deleted documents in CouchDB

Posted by Alexander Bolodurin <al...@gmail.com>.
Thanks Robert,

I understand the mechanics, but it doesn't quite solve my problem yet.

In your example it's clear: one replica edits foo, another one deletes foo, so both will see a live and a _deleted revisions.
But it's not the only case. If I happened to resolve a regular edit conflict and delete one revision, the result is identical (as it should be).
Except in the second case I shouldn't delete the live revision, because it has been introduced as a result of conflict resolution, the user hasn't deleted anything.

As far as I can tell, there is no way to tell the "origin" of a deleted revision, at least this way.

Example: https://gist.github.com/3952603

On 25/10/2012, at 11:17 PM, Robert Newson wrote:

> A deletion is just an update. The algorithm that CouchDB uses to
> choose one leaf out of many deliberately chooses _deleted:false over
> _deleted:true.
> 
> Here's a test run I just performed on couchdb/master;
> 
> # setup instance #1
> curl localhost:5984/alex -XPUT
> {"ok":true}
> 
> curl localhost:5984/alex/foo -XPUT -d{}
> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
> 
> # setup identical instance #2
> curl localhost:5984/alex2 -XPUT
> {"ok":true}
> 
> curl localhost:5984/alex2/foo -XPUT -d{}
> {"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
> 
> # update doc in instance #1
> curl localhost:5984/alex2/foo -XPUT -d
> '{"_rev:"1-967a00dff5e02add41819138abb3284d"}'
> 
> # delete doc in instance #2
> curl localhost:5984/alex2/foo?rev=1-967a00dff5e02add41819138abb3284d  -XDELETE
> 
> curl localhost:5984/_replicate -Hcontent-type:application/json -d
> '{"source":"alex2","target":"alex"}'
> {"ok":true,"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","source_last_seq":2,"replication_id_version":3,"history":[{"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","start_time":"Thu,
> 25 Oct 2012 12:10:54 GMT","end_time":"Thu, 25 Oct 2012 12:10:54
> GMT","start_last_seq":0,"end_last_seq":2,"recorded_seq":2,"missing_checked":1,"missing_found":1,"docs_read":1,"docs_written":1,"doc_write_failures":0}]}
> 
> curl localhost:5984/alex/foo
> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
> 
> curl 'localhost:5984/alex/foo?open_revs=all'
> --2b1fcadf47010c46a3afa22b7533dd07
> Content-Type: application/json
> 
> {"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
> --2b1fcadf47010c46a3afa22b7533dd07
> Content-Type: application/json
> 
> {"_id":"foo","_rev":"2-eec205a9d413992850a6e32678485900","_deleted":true}
> --2b1fcadf47010c46a3afa22b7533dd07--%
> 
> As you can see, the first database, alex, will show the non-deleted
> doc as per our algorithm, but the doc has two leaf revisions now. To
> resolve in the direction you want, delete the
> 2-7051cbe5c8faecd085a3fa619e6e6337 revision;
> 
> curl localhost:5984/alex/foo?rev=2-7051cbe5c8faecd085a3fa619e6e6337 -XDELETE
> {"ok":true,"id":"foo","rev":"3-7379b9e515b161226c6559d90c4dc49f"}
> 
> curl 'localhost:5984/alex/foo'
> {"error":"not_found","reason":"deleted"}
> 
> B.
> 
> On 25 October 2012 01:29, Alexander Bolodurin
> <al...@gmail.com> wrote:
>> Hi,
>> 
>> (I have asked this at StackOverflow, but, unsurprisingly, the question didn't get much attention.)
>> 
>> I'm designing replication conflict handling for a system, and one of its assumptions is that deletion always takes precedence when resolving conflicts: a deleted documents stays deleted regardless of what edits it conflicts with, IDs are not reused.
>> 
>> The "official" way of resolving replication conflicts (read conflicting revisions, merge in the application code, delete unwanted revisions) is not applicable to deleted documents. If a document is edited on instance 1, and deleted on instance 2, after replication both instances get the revision from 1. Because only one leaf revision is alive, the document ends up "undeleted", and without conflicts. The other revision ends up in _deleted_conflicts field, instead of _conflicts, but I can't use _deleted_conflicts as a cue that a document was deleted, because it includes deleted revisions from resolving edit conflicts and documents that were deleted and then re-added, so it's too general and conflates several cases.
>> 
>> How can I get around this at the CouchDB level? Moving it up the application layer gets really hairy really quickly as now I have to have my custom "deleted" flag, rewrite my views, test more code and have extra batch jobs to clean up records marked for delete.
>> 
>> Regards,
>> Alex.
> 


Re: Resolving replication conflicts for deleted documents in CouchDB

Posted by Robert Newson <rn...@apache.org>.
A deletion is just an update. The algorithm that CouchDB uses to
choose one leaf out of many deliberately chooses _deleted:false over
_deleted:true.

Here's a test run I just performed on couchdb/master;

# setup instance #1
curl localhost:5984/alex -XPUT
{"ok":true}

curl localhost:5984/alex/foo -XPUT -d{}
{"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}

# setup identical instance #2
curl localhost:5984/alex2 -XPUT
{"ok":true}

curl localhost:5984/alex2/foo -XPUT -d{}
{"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}

# update doc in instance #1
curl localhost:5984/alex2/foo -XPUT -d
'{"_rev:"1-967a00dff5e02add41819138abb3284d"}'

# delete doc in instance #2
curl localhost:5984/alex2/foo?rev=1-967a00dff5e02add41819138abb3284d  -XDELETE

curl localhost:5984/_replicate -Hcontent-type:application/json -d
'{"source":"alex2","target":"alex"}'
{"ok":true,"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","source_last_seq":2,"replication_id_version":3,"history":[{"session_id":"ed33d539fe675ac22b76c0a7be3fe1bf","start_time":"Thu,
25 Oct 2012 12:10:54 GMT","end_time":"Thu, 25 Oct 2012 12:10:54
GMT","start_last_seq":0,"end_last_seq":2,"recorded_seq":2,"missing_checked":1,"missing_found":1,"docs_read":1,"docs_written":1,"doc_write_failures":0}]}

curl localhost:5984/alex/foo
{"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}

curl 'localhost:5984/alex/foo?open_revs=all'
--2b1fcadf47010c46a3afa22b7533dd07
Content-Type: application/json

{"_id":"foo","_rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}
--2b1fcadf47010c46a3afa22b7533dd07
Content-Type: application/json

{"_id":"foo","_rev":"2-eec205a9d413992850a6e32678485900","_deleted":true}
--2b1fcadf47010c46a3afa22b7533dd07--%

As you can see, the first database, alex, will show the non-deleted
doc as per our algorithm, but the doc has two leaf revisions now. To
resolve in the direction you want, delete the
2-7051cbe5c8faecd085a3fa619e6e6337 revision;

curl localhost:5984/alex/foo?rev=2-7051cbe5c8faecd085a3fa619e6e6337 -XDELETE
{"ok":true,"id":"foo","rev":"3-7379b9e515b161226c6559d90c4dc49f"}

curl 'localhost:5984/alex/foo'
{"error":"not_found","reason":"deleted"}

B.

On 25 October 2012 01:29, Alexander Bolodurin
<al...@gmail.com> wrote:
> Hi,
>
> (I have asked this at StackOverflow, but, unsurprisingly, the question didn't get much attention.)
>
> I'm designing replication conflict handling for a system, and one of its assumptions is that deletion always takes precedence when resolving conflicts: a deleted documents stays deleted regardless of what edits it conflicts with, IDs are not reused.
>
> The "official" way of resolving replication conflicts (read conflicting revisions, merge in the application code, delete unwanted revisions) is not applicable to deleted documents. If a document is edited on instance 1, and deleted on instance 2, after replication both instances get the revision from 1. Because only one leaf revision is alive, the document ends up "undeleted", and without conflicts. The other revision ends up in _deleted_conflicts field, instead of _conflicts, but I can't use _deleted_conflicts as a cue that a document was deleted, because it includes deleted revisions from resolving edit conflicts and documents that were deleted and then re-added, so it's too general and conflates several cases.
>
> How can I get around this at the CouchDB level? Moving it up the application layer gets really hairy really quickly as now I have to have my custom "deleted" flag, rewrite my views, test more code and have extra batch jobs to clean up records marked for delete.
>
> Regards,
> Alex.