You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Guby <gu...@gmail.com> on 2008/05/05 19:58:48 UTC

My CouchDB feature wish number 1: partial updating

Hi guys!

Having schema less documents like in CouchDB opens up for a lot of  
cool things as we all know. You can f.ex store all sorts of related  
data in one document and different documents can also store different  
amounts and types of data.

In theory this is all great, but in reality I have had a lot of  
problems when:

1. I want to do a small change to a document. Then I have to load ALL  
its data (which for big documents make for a huge overhead) so I can  
store back the complete document with its change.
2. When several processes want to perform small updates on the same  
document I get a lot of conflict errors.

In praxis this has led me to store my data in numerous smaller  
documents and store their relationships as parameters holding the ID  
of the parent object.

If partial updating could be implemented it would solve all this! I  
have no idea how hard this would be to implement for you guys, but  
from my side I would like it to work something like this:

We have the following document stored on the server:

{
	_id: "foo",
	revision: "123",
	data: {
		days: [1,2,3,4,5],
		horses: [{
			name:"kaspar",
			races_won: 10
		},
		{
			name:"greg",
			races_won: 0
		}]
	},
	pizzas_eaten: 15
};

We could have two processes working on the document:

Process 1 changes the number of pizzas eaten by sending back the id of  
the document it wants to change and the current revision it is at  
along with the changed data like this:

PUT {
	_id: "foo",
	revision: "123",
	_update: {
		pizzas_eaten: 20	
	}
}

and gets back the new revision number 234

Process 2 which still is at revision 123 can change the values of  
data.days without getting any conflicts by PUTing the following data:

PUT {
	_id: "foo",
	revision: "123",
	_update: {
		data.days: [1,2,3,4,5,6]}
	}
}

and gets back the new revision number 345

Now if Process one tries to update the data.days parameter like this:

PUT {
	_id: "foo",
	revision: "234",
	_update: {
		data.days: [1,2,3,4,5,6,7,8,9,0]}	
	}
}
it will get an conflict error because the data.days value has been  
changed since revision 234 (by the other process. The value of  
data.days is a the newer revision 345).

You could add new parameters as well:

PUT {
	_id: "foo",
	revision: "234",
	_update: {
		pizzas_eaten_on_avarage_a_day: 0.01
	}
}
Updating a value that doesn't exist could add it.

You could also remove/delete values and rearrange documents:

PUT {
	_id: "foo",
	revision: "456",
	_update: {
		pizzas: {
			eaten: 20,
			daily_avarage: 0.01
		}
	}
	_remove: {
		pizzas_eaten_on_avarage_a_day,
		pizzas_eaten
	}
}

The document would now look like this:

{
	_id: "foo",
	revision: "567",
	data: {
		days: [1,2,3,4,5,6],
		horses: [{
			name:"kaspar",
			races_won: 10
		},
		{
			name:"greg",
			races_won: 0
		}]
	},
	pizzas: {
		eaten: 20,
		daily_avarage: 0.01
	}
};


The database server would have to keep track of at what revision the  
different values are at though... that might be cumbersome...

It would greatly improve CouchDB's usability in my case though!

Let me know what you think!

Best regards
Sebastian


Re: My CouchDB feature wish number 1: partial updating

Posted by Ralf Nieuwenhuijsen <ra...@gmail.com>.
2008/5/7 Guby <gu...@gmail.com>:
>
> > One step beyond, you could even say the document-map is a tree. Why
> > even speak of documents and databases?
> > Image access like this:
> >
> >  GET couchserver/mydatabase/mydocument/pizzas/eaten
> >
> > To just get part of the document, or to update part of the document.
> > Deleting all documents in the database:
> >
> >  PUT couchserver/mydatabase
> >
> >  {}
> >
>  That is a really cool idea, but then, how do you get views to work? What do
> you pass on to the views for indexing if you don't have a document entity?

Views could get an 'array' for each 'path' in the documents.
Something like

function(path, val)
{
  if( path[0] == 'pizzas' && path.length == 2)
  map( path[1], val );
}

Although there is not a flat collection of documents, but rather a
tree of data, that doesn't mean that tree-nodes have no identitty.
Since a map consists of keys van values. The path of keys is the
unique identifer for the associated data.

Greetings,
Ralf

Re: My CouchDB feature wish number 1: partial updating

Posted by Guby <gu...@gmail.com>.
> One step beyond, you could even say the document-map is a tree. Why
> even speak of documents and databases?
> Image access like this:
>
>  GET couchserver/mydatabase/mydocument/pizzas/eaten
>
> To just get part of the document, or to update part of the document.
> Deleting all documents in the database:
>
>  PUT couchserver/mydatabase
>
>  {}
That is a really cool idea, but then, how do you get views to work?  
What do you pass on to the views for indexing if you don't have a  
document entity?

Best regards
Sebastian

Re: My CouchDB feature wish number 1: partial updating

Posted by Ralf Nieuwenhuijsen <ra...@gmail.com>.
2008/5/7 Michael Hendricks <mi...@ndrix.org>:
> On Wed, May 07, 2008 at 12:00:33AM +0200, Ralf Nieuwenhuijsen wrote:
>  >   - updates solve the replication issue completely; since updates can
>  > easily be merged. (it is the assumed behavior)
>
>  I think there could still be replication conflicts with an update
>  approach.  For instance, if the original document has
>
>     "pizza" : "cheese"
>
>  and in one database I change it to
>
>     "pizza" : "pepperoni"
>
>  and in another I change it to
>
>     "pizza" : "anchovies"
>
>  one still encounters a conflict.  Those two changes can't both succeed.
>  Perhaps I've overlooked some way to work around this.

True. But that's usually not an issue.
What _is_ an issue, is that one update changed say:

  "key1": "someNewValue"

While the other changed:

  "key2": "someotherNewValue"

And of one these disjoint updates is completely ignored. (which is
currently the behavior)
I would consider a timestamp with updates to be the best juror with conflicts.
I just think the top-level map should be 'merged'.

Re: My CouchDB feature wish number 1: partial updating

Posted by Michael Hendricks <mi...@ndrix.org>.
On Wed, May 07, 2008 at 12:00:33AM +0200, Ralf Nieuwenhuijsen wrote:
>   - updates solve the replication issue completely; since updates can
> easily be merged. (it is the assumed behavior)

I think there could still be replication conflicts with an update
approach.  For instance, if the original document has

    "pizza" : "cheese"

and in one database I change it to

    "pizza" : "pepperoni"

and in another I change it to

    "pizza" : "anchovies"

one still encounters a conflict.  Those two changes can't both succeed.
Perhaps I've overlooked some way to work around this.

-- 
Michael

Re: My CouchDB feature wish number 1: partial updating

Posted by Ralf Nieuwenhuijsen <ra...@gmail.com>.
As a user, I must say, this sounds like an incredible idea.

You neglected to mention a few extra advantages though:
  - updates could in theory be revision-less. That is: you don't
specify a revision.
  - updates could just be the default behavior. Removing keys would be
done by setting them to undefined.
  - updates solve the replication issue completely; since updates can
easily be merged. (it is the assumed behavior)

One step beyond, you could even say the document-map is a tree. Why
even speak of documents and databases?
Image access like this:

  GET couchserver/mydatabase/mydocument/pizzas/eaten

To just get part of the document, or to update part of the document.
Deleting all documents in the database:

  PUT couchserver/mydatabase

  {}

Deleting a specific key from a document

  DELETE couchserver/mydatabase/mydocument/pizzas

Instead of storing old revisions, we could just log changes/updates
with a timestamp
When replicating, the logs are merged and replayed based on their timestamp.

Greetings,
Ralf N.

2008/5/5 Guby <gu...@gmail.com>:
> Hi guys!
>
>  Having schema less documents like in CouchDB opens up for a lot of cool
> things as we all know. You can f.ex store all sorts of related data in one
> document and different documents can also store different amounts and types
> of data.
>
>  In theory this is all great, but in reality I have had a lot of problems
> when:
>
>  1. I want to do a small change to a document. Then I have to load ALL its
> data (which for big documents make for a huge overhead) so I can store back
> the complete document with its change.
>  2. When several processes want to perform small updates on the same
> document I get a lot of conflict errors.
>
>  In praxis this has led me to store my data in numerous smaller documents
> and store their relationships as parameters holding the ID of the parent
> object.
>
>  If partial updating could be implemented it would solve all this! I have no
> idea how hard this would be to implement for you guys, but from my side I
> would like it to work something like this:
>
>  We have the following document stored on the server:
>
>  {
>         _id: "foo",
>         revision: "123",
>         data: {
>                 days: [1,2,3,4,5],
>                 horses: [{
>                         name:"kaspar",
>                         races_won: 10
>                 },
>                 {
>                         name:"greg",
>                         races_won: 0
>                 }]
>         },
>         pizzas_eaten: 15
>  };
>
>  We could have two processes working on the document:
>
>  Process 1 changes the number of pizzas eaten by sending back the id of the
> document it wants to change and the current revision it is at along with the
> changed data like this:
>
>  PUT {
>         _id: "foo",
>         revision: "123",
>         _update: {
>                 pizzas_eaten: 20
>         }
>  }
>
>  and gets back the new revision number 234
>
>  Process 2 which still is at revision 123 can change the values of data.days
> without getting any conflicts by PUTing the following data:
>
>  PUT {
>         _id: "foo",
>         revision: "123",
>         _update: {
>                 data.days: [1,2,3,4,5,6]}
>         }
>  }
>
>  and gets back the new revision number 345
>
>  Now if Process one tries to update the data.days parameter like this:
>
>  PUT {
>         _id: "foo",
>         revision: "234",
>         _update: {
>                 data.days: [1,2,3,4,5,6,7,8,9,0]}
>         }
>  }
>  it will get an conflict error because the data.days value has been changed
> since revision 234 (by the other process. The value of data.days is a the
> newer revision 345).
>
>  You could add new parameters as well:
>
>  PUT {
>         _id: "foo",
>         revision: "234",
>         _update: {
>                 pizzas_eaten_on_avarage_a_day: 0.01
>         }
>  }
>  Updating a value that doesn't exist could add it.
>
>  You could also remove/delete values and rearrange documents:
>
>  PUT {
>         _id: "foo",
>         revision: "456",
>         _update: {
>                 pizzas: {
>                         eaten: 20,
>                         daily_avarage: 0.01
>                 }
>         }
>         _remove: {
>                 pizzas_eaten_on_avarage_a_day,
>                 pizzas_eaten
>         }
>  }
>
>  The document would now look like this:
>
>  {
>         _id: "foo",
>         revision: "567",
>         data: {
>                 days: [1,2,3,4,5,6],
>                 horses: [{
>                         name:"kaspar",
>                         races_won: 10
>                 },
>                 {
>                         name:"greg",
>                         races_won: 0
>                 }]
>         },
>         pizzas: {
>                 eaten: 20,
>                 daily_avarage: 0.01
>         }
>  };
>
>
>  The database server would have to keep track of at what revision the
> different values are at though... that might be cumbersome...
>
>  It would greatly improve CouchDB's usability in my case though!
>
>  Let me know what you think!
>
>  Best regards
>  Sebastian
>
>

Re: My CouchDB feature wish number 1: partial updating

Posted by Daniel Yokomizo <da...@gmail.com>.
On Tue, May 6, 2008 at 10:11 PM, Guby <gu...@gmail.com> wrote:
>> In REST it's necessary that PUT has paste over semantics (i.e. update
>> or insert if doesn't exist). Using it for partial updates is
>> incorrect. As the HTTP spec defines PUT with this semantics all the
>> clients, proxies, etc., assume it, so we have to follow it precisely.
>> OTOH the HTTP verbs are extensible so we can define our own verbs to
>> do operations with other semantics. There's already a PATCH proposal
>> allowing partial updates of resources, you can see a blog entry about
>> the issues (from the proposal author) here:
>> http://www.snellspace.com/wp/?p=894.
>>
>> Best regards,
>> Daniel Yokomizo.
>
> Thanks for the comment! I didn't think of the consequences when misusing the
> PUT verb.
> The PATCH method draft was posted in January, and the latest comment in the
> blog entry is from the end of february, does anybody know how far they have
> come?

Joe Gregorio seems to be trying it at Google
(http://bitworking.org/news/298/Considering-PATCH), but the whole
thing is still in flux.

> And is there a probability that PATCH will be implemented in CouchDB?
>
> Best regards
> Sebastian

Best regards,
Daniel Yokomizo.

Re: My CouchDB feature wish number 1: partial updating

Posted by Guby <gu...@gmail.com>.
> In REST it's necessary that PUT has paste over semantics (i.e. update
> or insert if doesn't exist). Using it for partial updates is
> incorrect. As the HTTP spec defines PUT with this semantics all the
> clients, proxies, etc., assume it, so we have to follow it precisely.
> OTOH the HTTP verbs are extensible so we can define our own verbs to
> do operations with other semantics. There's already a PATCH proposal
> allowing partial updates of resources, you can see a blog entry about
> the issues (from the proposal author) here:
> http://www.snellspace.com/wp/?p=894.
>
> Best regards,
> Daniel Yokomizo.

Thanks for the comment! I didn't think of the consequences when  
misusing the PUT verb.
The PATCH method draft was posted in January, and the latest comment  
in the blog entry is from the end of february, does anybody know how  
far they have come?
And is there a probability that PATCH will be implemented in CouchDB?

Best regards
Sebastian


Re: My CouchDB feature wish number 1: partial updating

Posted by Guby <gu...@gmail.com>.
> 2008/5/7 Daniel Yokomizo <da...@gmail.com>:
>> In REST it's necessary that PUT has paste over semantics (i.e. update
>> or insert if doesn't exist). Using it for partial updates is
>> incorrect. As the HTTP spec defines PUT with this semantics all the
>> clients, proxies, etc., assume it, so we have to follow it precisely.
>> OTOH the HTTP verbs are extensible so we can define our own verbs to
>> do operations with other semantics. There's already a PATCH proposal
>> allowing partial updates of resources, you can see a blog entry about
>> the issues (from the proposal author) here:
>> http://www.snellspace.com/wp/?p=894.
>
> This is another reason to do partial updates like:
>
> PUT couchserver/database/document/pizzas/eaten
> 20
>
> It would still be valid REST.
Yes, that is true!
I am all for you approach!
But maybe that is not what CouchDB wants to be or do.
It would be really great though, and should be backward compatible  
with the way it works now too!

> Greetings,
> Ralf.

Best regards
Sebastian

Re: My CouchDB feature wish number 1: partial updating

Posted by Daniel Yokomizo <da...@gmail.com>.
On Wed, May 7, 2008 at 10:56 AM, Ralf Nieuwenhuijsen
<ra...@gmail.com> wrote:
> 2008/5/7 Daniel Yokomizo <da...@gmail.com>:
>>  In REST it's necessary that PUT has paste over semantics (i.e. update
>>  or insert if doesn't exist). Using it for partial updates is
>>  incorrect. As the HTTP spec defines PUT with this semantics all the
>>  clients, proxies, etc., assume it, so we have to follow it precisely.
>>  OTOH the HTTP verbs are extensible so we can define our own verbs to
>>  do operations with other semantics. There's already a PATCH proposal
>>  allowing partial updates of resources, you can see a blog entry about
>>  the issues (from the proposal author) here:
>>  http://www.snellspace.com/wp/?p=894.
>
> This is another reason to do partial updates like:
>
> PUT couchserver/database/document/pizzas/eaten
> 20
>
> It would still be valid REST.

Yes, partial updates are nice, but they're an orthogonal concept. With
PATCH we can atomically update many parts, with partial updates (but
no PATCH) we need a BATCH mechanism to describe several operations at
once. Both are useful, PATCH expresses the default case of describing
the alterations of a single document and BATCH can be used to modify
several documents, even partial updates of some.

> Greetings,
> Ralf.

Best regards,
Daniel Yokomizo.

Re: My CouchDB feature wish number 1: partial updating

Posted by Ralf Nieuwenhuijsen <ra...@gmail.com>.
2008/5/7 Daniel Yokomizo <da...@gmail.com>:
>  In REST it's necessary that PUT has paste over semantics (i.e. update
>  or insert if doesn't exist). Using it for partial updates is
>  incorrect. As the HTTP spec defines PUT with this semantics all the
>  clients, proxies, etc., assume it, so we have to follow it precisely.
>  OTOH the HTTP verbs are extensible so we can define our own verbs to
>  do operations with other semantics. There's already a PATCH proposal
>  allowing partial updates of resources, you can see a blog entry about
>  the issues (from the proposal author) here:
>  http://www.snellspace.com/wp/?p=894.

This is another reason to do partial updates like:

PUT couchserver/database/document/pizzas/eaten
20

It would still be valid REST.

Greetings,
Ralf.

Re: My CouchDB feature wish number 1: partial updating

Posted by Daniel Yokomizo <da...@gmail.com>.
On Mon, May 5, 2008 at 2:58 PM, Guby <gu...@gmail.com> wrote:
> Hi guys!
>
> Having schema less documents like in CouchDB opens up for a lot of cool
> things as we all know. You can f.ex store all sorts of related data in one
> document and different documents can also store different amounts and types
> of data.
>
> In theory this is all great, but in reality I have had a lot of problems
> when:
>
> 1. I want to do a small change to a document. Then I have to load ALL its
> data (which for big documents make for a huge overhead) so I can store back
> the complete document with its change.
> 2. When several processes want to perform small updates on the same document
> I get a lot of conflict errors.
>
> In praxis this has led me to store my data in numerous smaller documents and
> store their relationships as parameters holding the ID of the parent object.
>
> If partial updating could be implemented it would solve all this! I have no
> idea how hard this would be to implement for you guys, but from my side I
> would like it to work something like this:
>
> We have the following document stored on the server:
>
> {
>        _id: "foo",
>        revision: "123",
>        data: {
>                days: [1,2,3,4,5],
>                horses: [{
>                        name:"kaspar",
>                        races_won: 10
>                },
>                {
>                        name:"greg",
>                        races_won: 0
>                }]
>        },
>        pizzas_eaten: 15
> };
>
> We could have two processes working on the document:
>
> Process 1 changes the number of pizzas eaten by sending back the id of the
> document it wants to change and the current revision it is at along with the
> changed data like this:
>
> PUT {
>        _id: "foo",
>        revision: "123",
>        _update: {
>                pizzas_eaten: 20
>        }
> }
>
> and gets back the new revision number 234
>
> Process 2 which still is at revision 123 can change the values of data.days
> without getting any conflicts by PUTing the following data:
>
> PUT {
>        _id: "foo",
>        revision: "123",
>        _update: {
>                data.days: [1,2,3,4,5,6]}
>        }
> }
>
> and gets back the new revision number 345
>
> Now if Process one tries to update the data.days parameter like this:
>
> PUT {
>        _id: "foo",
>        revision: "234",
>        _update: {
>                data.days: [1,2,3,4,5,6,7,8,9,0]}
>        }
> }
> it will get an conflict error because the data.days value has been changed
> since revision 234 (by the other process. The value of data.days is a the
> newer revision 345).
>
> You could add new parameters as well:
>
> PUT {
>        _id: "foo",
>        revision: "234",
>        _update: {
>                pizzas_eaten_on_avarage_a_day: 0.01
>        }
> }
> Updating a value that doesn't exist could add it.
>
> You could also remove/delete values and rearrange documents:
>
> PUT {
>        _id: "foo",
>        revision: "456",
>        _update: {
>                pizzas: {
>                        eaten: 20,
>                        daily_avarage: 0.01
>                }
>        }
>        _remove: {
>                pizzas_eaten_on_avarage_a_day,
>                pizzas_eaten
>        }
> }
>
> The document would now look like this:
>
> {
>        _id: "foo",
>        revision: "567",
>        data: {
>                days: [1,2,3,4,5,6],
>                horses: [{
>                        name:"kaspar",
>                        races_won: 10
>                },
>                {
>                        name:"greg",
>                        races_won: 0
>                }]
>        },
>        pizzas: {
>                eaten: 20,
>                daily_avarage: 0.01
>        }
> };
>
>
> The database server would have to keep track of at what revision the
> different values are at though... that might be cumbersome...
>
> It would greatly improve CouchDB's usability in my case though!
>
> Let me know what you think!

In REST it's necessary that PUT has paste over semantics (i.e. update
or insert if doesn't exist). Using it for partial updates is
incorrect. As the HTTP spec defines PUT with this semantics all the
clients, proxies, etc., assume it, so we have to follow it precisely.
OTOH the HTTP verbs are extensible so we can define our own verbs to
do operations with other semantics. There's already a PATCH proposal
allowing partial updates of resources, you can see a blog entry about
the issues (from the proposal author) here:
http://www.snellspace.com/wp/?p=894.

> Best regards
> Sebastian

Best regards,
Daniel Yokomizo.