You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Pepijn de Vos <pe...@yahoo.com> on 2011/09/09 15:37:45 UTC

[proposal] Update Handler Conflict Resolution

Hi,

Today I had a long and complicated discussion involving rnewson, jan____, me and muhqu.

I was falsely assuming Document Update Handlers did atomic updates. Truth is that they don't, but for most of my use cases could.

What currently happens is that the update handler gets executed with the latest rev of the requested doc. When the update is completed, it is committed, but when another update has meanwhile happened, a conflict arises and 409 is returned.

When I use update handlers, I mostly use idempotent functions. This means that it is safe for the update handler to retry on its own account, and in doing so, avoid a ton of latency and headaches.

Only, it turns out that programmatic updates are not the sole use for update handlers, so rnewson argued that it might destroy data.

Proposal 1:
Add an idempotent=true parameter to the handler, allowing it to retry on its own.

Proposal 2:
Add an update() function so the handler can handle conflicts in itself.

I cannot program Erlang. I think I might be able to work out option 1, but not 2. So if option 2 is desired, someone else will have to write it.

Pepijn

Re: [proposal] Update Handler Conflict Resolution

Posted by Pepijn de Vos <pe...@yahoo.com>.

Yes, I would imagine the update function tries to commit the result. Maybe update could itself return the new document?

I think rnewson or jan____ said there was already some sort of timeout in place.

Pepijn

On Sep 11, 2011, at 4:54 AM, Randall Leeds wrote:

> On Fri, Sep 9, 2011 at 06:37, Pepijn de Vos <pe...@yahoo.com> wrote:
> 
>> Hi,
>> 
>> Today I had a long and complicated discussion involving rnewson, jan____,
>> me and muhqu.
>> 
>> I was falsely assuming Document Update Handlers did atomic updates. Truth
>> is that they don't, but for most of my use cases could.
>> 
>> What currently happens is that the update handler gets executed with the
>> latest rev of the requested doc. When the update is completed, it is
>> committed, but when another update has meanwhile happened, a conflict arises
>> and 409 is returned.
>> 
>> When I use update handlers, I mostly use idempotent functions. This means
>> that it is safe for the update handler to retry on its own account, and in
>> doing so, avoid a ton of latency and headaches.
>> 
>> Only, it turns out that programmatic updates are not the sole use for
>> update handlers, so rnewson argued that it might destroy data.
>> 
>> Proposal 1:
>> Add an idempotent=true parameter to the handler, allowing it to retry on
>> its own.
>> 
>> Proposal 2:
>> Add an update() function so the handler can handle conflicts in itself.
>> 
> 
> Just so I understand... is update() a function called from within the
> handler that tries to commit the result?
> If this fails, should there be a function to retrieve the latest version of
> the document again?
> I sort of like this approach. Proposal 1 makes some sense but I worry about
> a request potentially being retried 'forever' and not sure how to impose
> limits on that without expanding our configuration space or HTTP API surface
> area when proposal 2 is perhaps cleaner.
> 
> -Randall

Re: [proposal] Update Handler Conflict Resolution

Posted by Alexander Shorin <kx...@gmail.com>.

On Thu, Sep 15, 2011 at 5:58 PM, Jan Lehnardt <ja...@apache.org> wrote:
>
> On Sep 11, 2011, at 21:38 , Alexander Shorin wrote:
>
>> Hi!
>>
>> Sorry for stupid question, but is there any reasons why _update
>> handlers should have custom conflict resolution logic while simple
>> document store API not?
>
> They do not.
>
> To update a document through the document REST API, you do:
>
> 1. get document
> 2. change document
> 3. save document
>
> 1. and 3. are operations that go over HTTP.
>
> An update function does this:
>
> 1. get document
> 2. change document
> 3. save document
>
> All of this is triggered with a single HTTP request, but the operations as far as CouchDB is concerned are exactly the same.
>
> In both situations, when 3 is executed both the validation function and the regular conflict handling is applied.
>
>
> Cheers
> Jan
> --

I understand this well, but this discussion is about adding some new
confict handlers only for update functions, right?

As for use cases, you could also do:

>
> 1. get document
> 2. change document
> 3. save document
>
> 1. and 3. are operations that go over HTTP.

for update handlers that implements some preprocessor or data
normalization work and there wouldn't be any atomic operation. Real
world example is easy to provide: some server, a lot of old clients,
few new ones and data protocol changing without breaking
compatibility. Wrapping all operations with documents by design
functions we could make changes more safety without external tools.


>
>
>> I think that better to implement some update_conflict() functions
>> which acts like validate_doc_update one - globally for all database,
>> one per design document - instead of dividing documents store logic.
>> This way will make things more clear and rules to be universal, not
>> with exceptions for some cases.
>>
>> In this way update_conflict function could looks like this:
>>
>> function update_conflict(stored_doc, actual_doc, update_handler, userCtx){
>>  ...
>> }
>> where:
>> stored_doc - document which was tried to store, but failed
>> actual_doc - actual document with same id
>> update_handler - null if document was stored by simple API or _update
>> handler function object or any reference to get _update function
>> userCtx - user context object
>>
>> But even this solution would save anyone from race conditions(:
>>
>> --
>> ,,,^..^,,,
>
>

Re: [proposal] Update Handler Conflict Resolution

Posted by Jan Lehnardt <ja...@apache.org>.

On Sep 11, 2011, at 21:38 , Alexander Shorin wrote:

> Hi!
> 
> Sorry for stupid question, but is there any reasons why _update
> handlers should have custom conflict resolution logic while simple
> document store API not?

They do not.

To update a document through the document REST API, you do:

1. get document
2. change document
3. save document

1. and 3. are operations that go over HTTP.

An update function does this:

1. get document
2. change document
3. save document

All of this is triggered with a single HTTP request, but the operations as far as CouchDB is concerned are exactly the same.

In both situations, when 3 is executed both the validation function and the regular conflict handling is applied.

Cheers
Jan
-- 








> I think that better to implement some update_conflict() functions
> which acts like validate_doc_update one - globally for all database,
> one per design document - instead of dividing documents store logic.
> This way will make things more clear and rules to be universal, not
> with exceptions for some cases.
> 
> In this way update_conflict function could looks like this:
> 
> function update_conflict(stored_doc, actual_doc, update_handler, userCtx){
>  ...
> }
> where:
> stored_doc - document which was tried to store, but failed
> actual_doc - actual document with same id
> update_handler - null if document was stored by simple API or _update
> handler function object or any reference to get _update function
> userCtx - user context object
> 
> But even this solution would save anyone from race conditions(:
> 
> --
> ,,,^..^,,,

Re: [proposal] Update Handler Conflict Resolution

Posted by Alexander Shorin <kx...@gmail.com>.

Hi!

Sorry for stupid question, but is there any reasons why _update
handlers should have custom conflict resolution logic while simple
document store API not?
I think that better to implement some update_conflict() functions
which acts like validate_doc_update one - globally for all database,
one per design document - instead of dividing documents store logic.
This way will make things more clear and rules to be universal, not
with exceptions for some cases.

In this way update_conflict function could looks like this:

function update_conflict(stored_doc, actual_doc, update_handler, userCtx){
  ...
}
where:
stored_doc - document which was tried to store, but failed
actual_doc - actual document with same id
update_handler - null if document was stored by simple API or _update
handler function object or any reference to get _update function
userCtx - user context object

But even this solution would save anyone from race conditions(:

--
,,,^..^,,,

Re: [proposal] Update Handler Conflict Resolution

Posted by Pepijn de Vos <pe...@yahoo.com>.

Okay, so a new config value will have to be introduced?

Where would be the best way to specify if a function can be repeated? In the request? It's the most flexible, and I don't see any harm, but I don't see a need either. Somewhere in the ddoc maybe?

What is the process for contributing a patch? I'm nor Erlang programmer, but I'd like to keep the ball rolling.

Pepijn

On Sep 15, 2011, at 7:42 PM, Randall Leeds wrote:

> On Thu, Sep 15, 2011 at 09:56, Pepijn de Vos <pe...@yahoo.com> wrote:
> 
>>> I think proposal 1 makes more sense. Open for suggestions
>>> about how the limits look.
>> 
>> I'm also for proposal 1. I'm not an expert, but we have os_process_timeout
>> already, wouldn't that work?
>> 
> 
> That's for killing/reaping os processes that become unresponsive. This
> thread suggests a system wherein the query server is responding, but being
> asked to repeat the operation because there was a conflict.

Re: [proposal] Update Handler Conflict Resolution

Posted by Randall Leeds <ra...@gmail.com>.

On Thu, Sep 15, 2011 at 09:56, Pepijn de Vos <pe...@yahoo.com> wrote:

> > I think proposal 1 makes more sense. Open for suggestions
> > about how the limits look.
>
> I'm also for proposal 1. I'm not an expert, but we have os_process_timeout
> already, wouldn't that work?
>

That's for killing/reaping os processes that become unresponsive. This
thread suggests a system wherein the query server is responding, but being
asked to repeat the operation because there was a conflict.

Re: [proposal] Update Handler Conflict Resolution

Posted by Pepijn de Vos <pe...@yahoo.com>.

> I think proposal 1 makes more sense. Open for suggestions
> about how the limits look.

I'm also for proposal 1. I'm not an expert, but we have os_process_timeout already, wouldn't that work?

Now that I'm thinking about the location of things, it should probably be the update handler that specifies it can be retried. I imagine this not to vary across requests for the same handler.

Pepijn

Re: [proposal] Update Handler Conflict Resolution

Posted by Randall Leeds <ra...@gmail.com>.

On Sun, Sep 11, 2011 at 12:11, Jason Smith <jh...@iriscouch.com> wrote:

> Hi, Randall. May I refer to the update() function as store() to avoid
> overloading the word "update"?
>
>    var store = update; // Hopefully this clarifies the email.
>
> Also, I use ?retry=true instead of ?indempotent=true because we are
> talking about either 0 or 1 total document updates. _update runs until
> it works; it doesn't ever update the same document twice in one query.
>
> On Sun, Sep 11, 2011 at 9:54 AM, Randall Leeds <ra...@gmail.com>
> wrote:
> > Just so I understand... is update() a function called from within the
> > handler that tries to commit the result?
> > If this fails, should there be a function to retrieve the latest version
> of
> > the document again?
> > I sort of like this approach. Proposal 1 makes some sense but I worry
> about
> > a request potentially being retried 'forever' and not sure how to impose
> > limits on that without expanding our configuration space or HTTP API
> surface
> > area when proposal 2 is perhaps cleaner.
>
> I'm curious what makes proposal 2 cleaner. I'm unsure of two things:
>
> Firstly, what if an _update function calls store() and also returns a
> document? What if it calls store() twice? Am I misunderstanding?
>
> Secondly, won't this require modifying the view server protocol?
>
> Proposal 1, yes, adds one parameter to the API. But it does not change
> Couch conceptually. Instead of high-latency retry loops, you get
> low-latency retry loops. _update itself follows this philosophy, being
> API sugar for a low-latency fetch-modify-store loop.
>
> In the _update handler code,
>
>
> https://github.com/apache/couchdb/blob/trunk/src/couchdb/couch_httpd_show.erl#L125
>
> It seems like send_doc_update_response could have an `Attempts`
> parameter, initially 0. If couch_db:update_doc throws a conflict and
> ?retry=true, re-enter the function with Attempts+1. After 2 or 3
> attempts, it would re-throw. Is this a possible approach?
>

Sounds reasonable. You're right about my assessment and how much it would
change things. I think proposal 1 makes more sense. Open for suggestions
about how the limits look.

First thoughts on

Limit set at query time:
1) It may be desirable not to clients dictate the use of server resources so
explicitly
2) Super flexible, but is that useful?

Limit set in the configuration system:
1) Lets the owner of the system deployment set limits on this. If this is
desirable I expect you'll be the first to tell me.

Limit on the design document itself:
1) Not sure this has a good use case, but it's a natural middle ground
between the other two so I bring it up.

Anything else?



>
> Thanks!
>
> --
> Iris Couch
>

Re: [proposal] Update Handler Conflict Resolution

Posted by Jason Smith <jh...@iriscouch.com>.

Hi, Randall. May I refer to the update() function as store() to avoid
overloading the word "update"?

    var store = update; // Hopefully this clarifies the email.

Also, I use ?retry=true instead of ?indempotent=true because we are
talking about either 0 or 1 total document updates. _update runs until
it works; it doesn't ever update the same document twice in one query.

On Sun, Sep 11, 2011 at 9:54 AM, Randall Leeds <ra...@gmail.com> wrote:
> Just so I understand... is update() a function called from within the
> handler that tries to commit the result?
> If this fails, should there be a function to retrieve the latest version of
> the document again?
> I sort of like this approach. Proposal 1 makes some sense but I worry about
> a request potentially being retried 'forever' and not sure how to impose
> limits on that without expanding our configuration space or HTTP API surface
> area when proposal 2 is perhaps cleaner.

I'm curious what makes proposal 2 cleaner. I'm unsure of two things:

Firstly, what if an _update function calls store() and also returns a
document? What if it calls store() twice? Am I misunderstanding?

Secondly, won't this require modifying the view server protocol?

Proposal 1, yes, adds one parameter to the API. But it does not change
Couch conceptually. Instead of high-latency retry loops, you get
low-latency retry loops. _update itself follows this philosophy, being
API sugar for a low-latency fetch-modify-store loop.

In the _update handler code,

    https://github.com/apache/couchdb/blob/trunk/src/couchdb/couch_httpd_show.erl#L125

It seems like send_doc_update_response could have an `Attempts`
parameter, initially 0. If couch_db:update_doc throws a conflict and
?retry=true, re-enter the function with Attempts+1. After 2 or 3
attempts, it would re-throw. Is this a possible approach?

Thanks!

-- 
Iris Couch

Re: [proposal] Update Handler Conflict Resolution

Posted by Randall Leeds <ra...@gmail.com>.

On Fri, Sep 9, 2011 at 06:37, Pepijn de Vos <pe...@yahoo.com> wrote:

> Hi,
>
> Today I had a long and complicated discussion involving rnewson, jan____,
> me and muhqu.
>
> I was falsely assuming Document Update Handlers did atomic updates. Truth
> is that they don't, but for most of my use cases could.
>
> What currently happens is that the update handler gets executed with the
> latest rev of the requested doc. When the update is completed, it is
> committed, but when another update has meanwhile happened, a conflict arises
> and 409 is returned.
>
> When I use update handlers, I mostly use idempotent functions. This means
> that it is safe for the update handler to retry on its own account, and in
> doing so, avoid a ton of latency and headaches.
>
> Only, it turns out that programmatic updates are not the sole use for
> update handlers, so rnewson argued that it might destroy data.
>
> Proposal 1:
> Add an idempotent=true parameter to the handler, allowing it to retry on
> its own.
>
> Proposal 2:
> Add an update() function so the handler can handle conflicts in itself.
>

Just so I understand... is update() a function called from within the
handler that tries to commit the result?
If this fails, should there be a function to retrieve the latest version of
the document again?
I sort of like this approach. Proposal 1 makes some sense but I worry about
a request potentially being retried 'forever' and not sure how to impose
limits on that without expanding our configuration space or HTTP API surface
area when proposal 2 is perhaps cleaner.

-Randall