You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Michael Ramirez <mi...@yahoo.com> on 2008/11/13 17:20:30 UTC

Document Updates

When updating documents must the entire document be resent or just the changed fields?

Michael

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Thu, Nov 13, 2008 at 04:13:44PM -0500, Paul Davis wrote:
> On Thu, Nov 13, 2008 at 4:04 PM, Noah Slater <ns...@apache.org> wrote:
> > On Thu, Nov 13, 2008 at 02:06:30PM -0500, Paul Davis wrote:
> >> How hard can it be to change an RFC?
> >
> > I hope that's humour! :p
> >
>
> It's only *one* word! :D

Heh. Well, I've been waiting for about three months for a simple update to the
Atom Relationships record with IANA. I doubt the EETF is any faster. In
addition, I should imagine it would have to be a revised standard, and these
things take ages. If you wanna do it, and get your name on the RFC, well...

I still think it's best to see what other effort has been done in this area.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Paul Davis <pa...@gmail.com>.

On Thu, Nov 13, 2008 at 4:04 PM, Noah Slater <ns...@apache.org> wrote:
> On Thu, Nov 13, 2008 at 02:06:30PM -0500, Paul Davis wrote:
>> How hard can it be to change an RFC?
>
> I hope that's humour! :p
>

It's only *one* word! :D

> --
> Noah Slater, http://tumbolia.org/nslater
>

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Thu, Nov 13, 2008 at 02:06:30PM -0500, Paul Davis wrote:
> How hard can it be to change an RFC?

I hope that's humour! :p

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Paul Davis <pa...@gmail.com>.

On Thu, Nov 13, 2008 at 11:43 AM, Noah Slater <ns...@apache.org> wrote:
> On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:
>> I think that this should be pretty easily done using:
>> a) well defined pretty format output
>> b) standard diff
>>
>> The reason for (a) is that you need this to get line breaks, which are
>> critical to diffing correctly.
>
> It's a bit more complex than that, canonicalised JSON is still in it's infancy,
> so we would have to get the community to adopt that first. I know that people
> have been discussing JSON diffs before, may be worth looking up what's already
> been done on this.
>
> --
> Noah Slater, http://tumbolia.org/nslater
>

I think the JSON diff is a great idea. Unfortunately, the RFC is a bit
worrisome in one respect:

Section 2.2:
'The names within an object SHOULD be unique.'

I think that this could be a pretty big stumbling block if different
parsers start taking different interpretations of that and I've alread
seen implementations do slightly different things with repeated field
names.

Contemplating the canonical spec I think I would prefer updating the
JSON spec's use of SHOULD to MUST. The canonical thing to me seems
more like a normalization method as opposed to a hard and fast spec.
Specifically I could see lots of wasted cycles spent on keeping
canonization when it's needed relatively infrequently.

Assuming the change to MUST, I could probably write and implement a
first draft of the spec in a day.

How hard can it be to change an RFC?

Paul

Re: Document Updates

Posted by "ara.t.howard" <ar...@gmail.com>.

On Nov 13, 2008, at 5:09 PM, Chris Anderson wrote:

>
> Forgive me for throwing out a loose-cannon idea, but would it be
> easiest to provide an API where the user sends a Javascript function
> to CouchDB via the PATCH method? The function could look something
> like:
>
> function(doc) {
>  doc.my_field = "new value";
>  doc.existing_array[3] = "another new value";
>  doc.new_array = ["a", "b", 3];
>  return doc;
> }
>
> --  

yeah - i loooooooove this idea!

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama

Re: Document Updates

Posted by Paul Davis <pa...@gmail.com>.

On Fri, Nov 14, 2008 at 8:37 PM, Paul Davis <pa...@gmail.com> wrote:
> I wrote a fuzz thing to go along with the diff testing.
>
> You can get it with:
>
> $ sudo easy_install jsontools
>
> # Examples
> from StringIO import StringIO
> import jsontools
> stream = StringIO()
>
> //Fuzzy objects
> fj = jsontools.FuzzyJson()
> obj1 = fj.generate(1).next()
> obj2 = fj.modify(obj1)
>
> //Diff the objects
> jsontools.jsondiff(obj1, obj2, stream=stream)
>
> //Apply the diff
> stream.seek(0)
> result = jsontools.jsonapply(stream, obj1)
>
> //Compare them
> assert jsontools.jsoncmp(result, obj2) == 2
>

== True

> Any comments?
>
> Paul
>
> On Thu, Nov 13, 2008 at 9:34 PM, Paul Davis <pa...@gmail.com> wrote:
>> I don't think we need canonical JSON.
>>
>> The Spec definitely needs to be disambiguated though. As I see it
>> there are two interpretations:
>>
>> 1. Order of fields matters which means repeated fields are ok
>> 2. Order does not matter which means repeated fields are NOT ok
>>
>> It doesn't matter which is chosen, but one of them must be to make this work.
>>
>> Also, I got bored. So I implemented JSON diff in python for Case #2.
>>
>> http://www.davispj.com/svn/projects/json-diff/json-diff.py
>>
>> I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
>> library and then pound the diff thing with it.
>>
>> Not sure if it's obvious or not, but switching from case 2 to 1 is
>> straightforward. Also, my current array diff implementation is kinda
>> whack. And indels screw the rest of the diff, as in, its not so much a
>> diff as a delete rest and add new. Getting this optimal is actually an
>> N^2 runtime algorithm via dynamic programming (smith-waterman style)
>>
>> Also, do note that the erlang parser and python (and i assume ruby is
>> in the python boat) have different behaviors in respect to the 2
>> cases. Erlang is Case 1, python is case 2.
>>
>> Paul
>>
>>
>> On Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson <jc...@apache.org> wrote:
>>> On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard <ar...@gmail.com> wrote:
>>>>
>>>> On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:
>>>>
>>>>> You could use the view mechanism, and attach a "language" attribute, and
>>>>> have this be a general transformation interface, which would indeed be very
>>>>> nice. For efficiency you would want to apply this over sets of documents,
>>>>> and probably in a transactional context like bulk update does now.
>>>>>
>>>>> However... Damien wants something to use in replication, which would mean
>>>>> that javascript would then become a required, rather than an optional part
>>>>> of Couch, because replication would require it (unless you made the
>>>>> replication diff generator pluggable ... but why go there?). The benefit of
>>>>> the declarative diff format is that applying a diff can be done within
>>>>> Couch.
>>>>
>>>> couldn't these queries run in the view server?  in fact any mechanism which
>>>> would allow the view server could accomplish this with a protocol between it
>>>> and the db server.  basically it's an addition to the map/reduce
>>>> functionality which would alter documents on the fly.
>>>>
>>>
>>> Antony's right the currently replication does not depend on the
>>> availability of the view server. And I think it is smart to avoid that
>>> dependence, when possible.
>>>
>>> Alas, my attempt to bypass all the craziness that is canonical JSON,
>>> has come short of that. Oh wells...
>>>
>>> --
>>> Chris Anderson
>>> http://jchris.mfdz.com
>>>
>>
>

Re: Document Updates

Posted by Paul Davis <pa...@gmail.com>.

I wrote a fuzz thing to go along with the diff testing.

You can get it with:

$ sudo easy_install jsontools

# Examples
from StringIO import StringIO
import jsontools
stream = StringIO()

//Fuzzy objects
fj = jsontools.FuzzyJson()
obj1 = fj.generate(1).next()
obj2 = fj.modify(obj1)

//Diff the objects
jsontools.jsondiff(obj1, obj2, stream=stream)

//Apply the diff
stream.seek(0)
result = jsontools.jsonapply(stream, obj1)

//Compare them
assert jsontools.jsoncmp(result, obj2) == 2

Any comments?

Paul

On Thu, Nov 13, 2008 at 9:34 PM, Paul Davis <pa...@gmail.com> wrote:
> I don't think we need canonical JSON.
>
> The Spec definitely needs to be disambiguated though. As I see it
> there are two interpretations:
>
> 1. Order of fields matters which means repeated fields are ok
> 2. Order does not matter which means repeated fields are NOT ok
>
> It doesn't matter which is chosen, but one of them must be to make this work.
>
> Also, I got bored. So I implemented JSON diff in python for Case #2.
>
> http://www.davispj.com/svn/projects/json-diff/json-diff.py
>
> I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
> library and then pound the diff thing with it.
>
> Not sure if it's obvious or not, but switching from case 2 to 1 is
> straightforward. Also, my current array diff implementation is kinda
> whack. And indels screw the rest of the diff, as in, its not so much a
> diff as a delete rest and add new. Getting this optimal is actually an
> N^2 runtime algorithm via dynamic programming (smith-waterman style)
>
> Also, do note that the erlang parser and python (and i assume ruby is
> in the python boat) have different behaviors in respect to the 2
> cases. Erlang is Case 1, python is case 2.
>
> Paul
>
>
> On Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson <jc...@apache.org> wrote:
>> On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard <ar...@gmail.com> wrote:
>>>
>>> On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:
>>>
>>>> You could use the view mechanism, and attach a "language" attribute, and
>>>> have this be a general transformation interface, which would indeed be very
>>>> nice. For efficiency you would want to apply this over sets of documents,
>>>> and probably in a transactional context like bulk update does now.
>>>>
>>>> However... Damien wants something to use in replication, which would mean
>>>> that javascript would then become a required, rather than an optional part
>>>> of Couch, because replication would require it (unless you made the
>>>> replication diff generator pluggable ... but why go there?). The benefit of
>>>> the declarative diff format is that applying a diff can be done within
>>>> Couch.
>>>
>>> couldn't these queries run in the view server?  in fact any mechanism which
>>> would allow the view server could accomplish this with a protocol between it
>>> and the db server.  basically it's an addition to the map/reduce
>>> functionality which would alter documents on the fly.
>>>
>>
>> Antony's right the currently replication does not depend on the
>> availability of the view server. And I think it is smart to avoid that
>> dependence, when possible.
>>
>> Alas, my attempt to bypass all the craziness that is canonical JSON,
>> has come short of that. Oh wells...
>>
>> --
>> Chris Anderson
>> http://jchris.mfdz.com
>>
>

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 2:36 PM, Noah Slater wrote:

> On Fri, Nov 14, 2008 at 01:22:58PM +1030, Antony Blakey wrote:
>>> 1. Order of fields matters which means repeated fields are ok
>>> 2. Order does not matter which means repeated fields are NOT ok
>>
>> Given that JSON is executable Javascript, 2 is the only  
>> interpretation
>> that allows for roundtrip equivalence.
>
> I fear this is rather a large jump to conclusion.

I'm only claiming that *roundtrip equivalence* is only possible if you  
don't allow duplicate keys, because JSON being a serialization format  
for (limited) Javascript data structures, cannot be generated with  
duplicate keys from those data structures.

Being executable, it's interpretation is defined by the Javascript  
spec. JSON is a serialization of a (limited) Javascript data  
structure. Javascript hashes don't allow for duplicate keys, nor do  
they (AFAIR) provide any ordering guarantees. I contend that the  
semantics of JSON follow from this, and as an extension I wonder if  
any JSON that isn't a serialization of some Javascript data structure  
should not be valid JSON. OTOH, the operational interpretation would  
suggest that maybe the text representation can have duplicate keys,  
but that the data structure that it represents (which is what we  
should care about) does not.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Some defeats are instalments to victory.
   -- Jacob Riis

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Fri, Nov 14, 2008 at 01:22:58PM +1030, Antony Blakey wrote:
>> 1. Order of fields matters which means repeated fields are ok
>> 2. Order does not matter which means repeated fields are NOT ok
>
> Given that JSON is executable Javascript, 2 is the only interpretation
> that allows for roundtrip equivalence.

I fear this is rather a large jump to conclusion.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Paul Davis <pa...@gmail.com>.

The JS PATCH function is definitely an interesting idea, but I think
is not at all going to fix the issue. How many use cases are going to
be able to reduce an update operation to a function?

Most use cases that I see are of the nature: "Download JSON,
deserialize to native language, mutate, serialize, send to server".
Anyone that wants write an abstract "mutate" -> JS function thing has
props in my book.

It is forseable that you could treat it as a stored procedure though.
As in the function signature becomes "function(doc, input)" and input
is a complex JSON object that is used in the updated. This almost has
actual use in terms of the earlier thread on transaction semantics
that I assume would be implementable. But really the transaction idea
is a whole new can of worms in terms of couch would have to be allowed
to request multiple documents in one transaction etc.

I think related to Noah's original sentiment of getting some diff
system set up way outside of couch in JSON land that's implementable
in any language for anything is the best bet.

Paul

On Thu, Nov 13, 2008 at 11:53 PM, ara.t.howard <ar...@gmail.com> wrote:
>
> On Nov 13, 2008, at 9:18 PM, Ayende Rahien wrote:
>
>> Take into account that the view server is explicitly a separate
>> process.Requiring
>> it to process incoming request would create a very high overhead.
>
> i'm just saying - if people could write javascript to execute on the server
> people would really be singing hallelujah.  from my perspective it's only
> about 10000000% as cool as being to plugin a different language in as a view
> server.
>
> 2 cts.
>
>
> a @ http://codeforpeople.com/
> --
> we can deny everything, except that we have the possibility of being better.
> simply reflect on that.
> h.h. the 14th dalai lama
>
>
>
>

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 1:04 PM, Paul Davis wrote:

> I don't think we need canonical JSON.
>
> The Spec definitely needs to be disambiguated though. As I see it
> there are two interpretations:
>
> 1. Order of fields matters which means repeated fields are ok
> 2. Order does not matter which means repeated fields are NOT ok

Given that JSON is executable Javascript, 2 is the only interpretation  
that allows for roundtrip equivalence.

> Not sure if it's obvious or not, but switching from case 2 to 1 is
> straightforward. Also, my current array diff implementation is kinda
> whack. And indels screw the rest of the diff, as in, its not so much a
> diff as a delete rest and add new. Getting this optimal is actually an
> N^2 runtime algorithm via dynamic programming (smith-waterman style)

The algorithm described here: http://www.springerlink.com/content/r1t6h8631868k615/ 
  is O(n), and although it isn't optimal, I'm guessing the performance  
stability makes up for it.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Did you hear about the Buddhist who refused Novocain during a root  
canal?
His goal: transcend dental medication.

Re: Document Updates

Posted by Paul Davis <pa...@gmail.com>.

I don't think we need canonical JSON.

The Spec definitely needs to be disambiguated though. As I see it
there are two interpretations:

1. Order of fields matters which means repeated fields are ok
2. Order does not matter which means repeated fields are NOT ok

It doesn't matter which is chosen, but one of them must be to make this work.

Also, I got bored. So I implemented JSON diff in python for Case #2.

http://www.davispj.com/svn/projects/json-diff/json-diff.py

I gotta jet, but when I get home in a bit I'm gonna write a JSON fuzz
library and then pound the diff thing with it.

Not sure if it's obvious or not, but switching from case 2 to 1 is
straightforward. Also, my current array diff implementation is kinda
whack. And indels screw the rest of the diff, as in, its not so much a
diff as a delete rest and add new. Getting this optimal is actually an
N^2 runtime algorithm via dynamic programming (smith-waterman style)

Also, do note that the erlang parser and python (and i assume ruby is
in the python boat) have different behaviors in respect to the 2
cases. Erlang is Case 1, python is case 2.

Paul

On Thu, Nov 13, 2008 at 8:20 PM, Chris Anderson <jc...@apache.org> wrote:
> On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard <ar...@gmail.com> wrote:
>>
>> On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:
>>
>>> You could use the view mechanism, and attach a "language" attribute, and
>>> have this be a general transformation interface, which would indeed be very
>>> nice. For efficiency you would want to apply this over sets of documents,
>>> and probably in a transactional context like bulk update does now.
>>>
>>> However... Damien wants something to use in replication, which would mean
>>> that javascript would then become a required, rather than an optional part
>>> of Couch, because replication would require it (unless you made the
>>> replication diff generator pluggable ... but why go there?). The benefit of
>>> the declarative diff format is that applying a diff can be done within
>>> Couch.
>>
>> couldn't these queries run in the view server?  in fact any mechanism which
>> would allow the view server could accomplish this with a protocol between it
>> and the db server.  basically it's an addition to the map/reduce
>> functionality which would alter documents on the fly.
>>
>
> Antony's right the currently replication does not depend on the
> availability of the view server. And I think it is smart to avoid that
> dependence, when possible.
>
> Alas, my attempt to bypass all the craziness that is canonical JSON,
> has come short of that. Oh wells...
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 11:50 AM, Chris Anderson wrote:

> Alas, my attempt to bypass all the craziness that is canonical JSON,
> has come short of that. Oh wells...

My proposal doesn't require canonical JSON because it is structural  
rather than textual. That's one reason I think it's a good approach.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

One should respect public opinion insofar as is necessary to avoid  
starvation and keep out of prison, but anything that goes beyond this  
is voluntary submission to an unnecessary tyranny.
   -- Bertrand Russell

Re: Document Updates

Posted by Chris Anderson <jc...@apache.org>.

On Thu, Nov 13, 2008 at 5:02 PM, ara.t.howard <ar...@gmail.com> wrote:
>
> On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:
>
>> You could use the view mechanism, and attach a "language" attribute, and
>> have this be a general transformation interface, which would indeed be very
>> nice. For efficiency you would want to apply this over sets of documents,
>> and probably in a transactional context like bulk update does now.
>>
>> However... Damien wants something to use in replication, which would mean
>> that javascript would then become a required, rather than an optional part
>> of Couch, because replication would require it (unless you made the
>> replication diff generator pluggable ... but why go there?). The benefit of
>> the declarative diff format is that applying a diff can be done within
>> Couch.
>
> couldn't these queries run in the view server?  in fact any mechanism which
> would allow the view server could accomplish this with a protocol between it
> and the db server.  basically it's an addition to the map/reduce
> functionality which would alter documents on the fly.
>

Antony's right the currently replication does not depend on the
availability of the view server. And I think it is smart to avoid that
dependence, when possible.

Alas, my attempt to bypass all the craziness that is canonical JSON,
has come short of that. Oh wells...

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Document Updates

Posted by "ara.t.howard" <ar...@gmail.com>.

On Nov 13, 2008, at 9:18 PM, Ayende Rahien wrote:

> Take into account that the view server is explicitly a separate
> process.Requiring
> it to process incoming request would create a very high overhead.

i'm just saying - if people could write javascript to execute on the  
server people would really be singing hallelujah.  from my perspective  
it's only about 10000000% as cool as being to plugin a different  
language in as a view server.

2 cts.

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama

Re: Document Updates

Posted by Ayende Rahien <ay...@ayende.com>.

Take into account that the view server is explicitly a separate
process.Requiring
it to process incoming request would create a very high overhead.

On Fri, Nov 14, 2008 at 3:02 AM, ara.t.howard <ar...@gmail.com>wrote:

>
> On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:
>
>  You could use the view mechanism, and attach a "language" attribute, and
>> have this be a general transformation interface, which would indeed be very
>> nice. For efficiency you would want to apply this over sets of documents,
>> and probably in a transactional context like bulk update does now.
>>
>> However... Damien wants something to use in replication, which would mean
>> that javascript would then become a required, rather than an optional part
>> of Couch, because replication would require it (unless you made the
>> replication diff generator pluggable ... but why go there?). The benefit of
>> the declarative diff format is that applying a diff can be done within
>> Couch.
>>
>
> couldn't these queries run in the view server?  in fact any mechanism which
> would allow the view server could accomplish this with a protocol between it
> and the db server.  basically it's an addition to the map/reduce
> functionality which would alter documents on the fly.
>
>
> a @ http://codeforpeople.com/
> --
> we can deny everything, except that we have the possibility of being
> better. simply reflect on that.
> h.h. the 14th dalai lama
>
>
>
>

Re: Document Updates

Posted by "ara.t.howard" <ar...@gmail.com>.

On Nov 13, 2008, at 5:49 PM, Antony Blakey wrote:

> You could use the view mechanism, and attach a "language" attribute,  
> and have this be a general transformation interface, which would  
> indeed be very nice. For efficiency you would want to apply this  
> over sets of documents, and probably in a transactional context like  
> bulk update does now.
>
> However... Damien wants something to use in replication, which would  
> mean that javascript would then become a required, rather than an  
> optional part of Couch, because replication would require it (unless  
> you made the replication diff generator pluggable ... but why go  
> there?). The benefit of the declarative diff format is that applying  
> a diff can be done within Couch.

couldn't these queries run in the view server?  in fact any mechanism  
which would allow the view server could accomplish this with a  
protocol between it and the db server.  basically it's an addition to  
the map/reduce functionality which would alter documents on the fly.

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 11:01 AM, Antony Blakey wrote:

>
> On 14/11/2008, at 10:39 AM, Chris Anderson wrote:
>
>> On Thu, Nov 13, 2008 at 3:37 PM, Noah Slater <ns...@apache.org>  
>> wrote:
>>>
>>> I did some digging to see what else is out there:
>>>
>>> * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
>>> * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
>>> * http://www.snellspace.com/wp/?p=895
>>> * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
>>> * http://www.snellspace.com/wp/?p=902
>>
>> Forgive me for throwing out a loose-cannon idea, but would it be
>> easiest to provide an API where the user sends a Javascript function
>> to CouchDB via the PATCH method? The function could look something
>> like:
>>
>> function(doc) {
>> doc.my_field = "new value";
>> doc.existing_array[3] = "another new value";
>> doc.new_array = ["a", "b", 3];
>> return doc;
>> }
>
> I thought that javascript wasn't part of the Couch core? JSON isn't  
> javascript, and all uses of javascript *could* be replaced with e.g.  
> Ruby (or my interest, Smalltalk), which is why there is a "language"  
> attribute on the views.
>
> Your proposal would change that.

You could use the view mechanism, and attach a "language" attribute,  
and have this be a general transformation interface, which would  
indeed be very nice. For efficiency you would want to apply this over  
sets of documents, and probably in a transactional context like bulk  
update does now.

However... Damien wants something to use in replication, which would  
mean that javascript would then become a required, rather than an  
optional part of Couch, because replication would require it (unless  
you made the replication diff generator pluggable ... but why go  
there?). The benefit of the declarative diff format is that applying a  
diff can be done within Couch.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

It's amazing that the one side of the conversation that survived is "I  
don't know art, but I know what I like". The reply from the artist was  
"Madam, so does a cow".
   -- Carl Kirkendall

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 10:39 AM, Chris Anderson wrote:

> On Thu, Nov 13, 2008 at 3:37 PM, Noah Slater <ns...@apache.org>  
> wrote:
>>
>> I did some digging to see what else is out there:
>>
>> * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
>> * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
>> * http://www.snellspace.com/wp/?p=895
>> * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
>> * http://www.snellspace.com/wp/?p=902
>
> Forgive me for throwing out a loose-cannon idea, but would it be
> easiest to provide an API where the user sends a Javascript function
> to CouchDB via the PATCH method? The function could look something
> like:
>
> function(doc) {
>  doc.my_field = "new value";
>  doc.existing_array[3] = "another new value";
>  doc.new_array = ["a", "b", 3];
>  return doc;
> }

I thought that javascript wasn't part of the Couch core? JSON isn't  
javascript, and all uses of javascript *could* be replaced with e.g.  
Ruby (or my interest, Smalltalk), which is why there is a "language"  
attribute on the views.

Your proposal would change that.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

75% of statistics are made up on the spot.

Re: Document Updates

Posted by Chris Anderson <jc...@apache.org>.

On Thu, Nov 13, 2008 at 3:37 PM, Noah Slater <ns...@apache.org> wrote:
>
> I did some digging to see what else is out there:
>
>  * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
>  * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
>  * http://www.snellspace.com/wp/?p=895
>  * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
>  * http://www.snellspace.com/wp/?p=902

Forgive me for throwing out a loose-cannon idea, but would it be
easiest to provide an API where the user sends a Javascript function
to CouchDB via the PATCH method? The function could look something
like:

function(doc) {
  doc.my_field = "new value";
  doc.existing_array[3] = "another new value";
  doc.new_array = ["a", "b", 3];
  return doc;
}

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

Hey,

Just a quick update on this thread. I came across this today:

  http://www.sitepen.com/blog/2008/11/18/when-to-use-persevere-a-comparison-with-couchdb-and-others/

Unfortunately, depressingly inaccurate, but it did lead me to:

  http://persevere.sitepen.com/#about

And in turn:

  http://www.sitepen.com/blog/2008/07/16/jsonquery-data-querying-beyond-jsonpath/

  http://goessner.net/articles/JsonPath/

  http://www.json.com/2007/10/19/json-referencing-proposal-and-library/

Food for thought.

Best,

-- 
Noah Slater, http://tumbolia.org/nslater

Re: RESTful? (was: Re: Document Updates)

Posted by Noah Slater <ns...@apache.org>.

I think it would be extreemly benificial if we made CouchDB provide self
descriptive hyperlinks that let clients explore the available URI space. Along
with a set of properly defined media types, this could go a long way towards
making CouchDB a truly RESTful database management system.

-- 
Noah Slater, http://tumbolia.org/nslater

RESTful? (was: Re: Document Updates)

Posted by Antony Blakey <an...@gmail.com>.

On 15/11/2008, at 8:26 AM, Antony Blakey wrote:

> A landing page with URLs for the design documents would also be  
> needed. View definitions would need a unique media type because  
> currently their meaning is dependent on their location. But maybe  
> I'm misunderstanding REST. So easy.

Thinking about this, it would not only be RESTful to have the server  
root page contain links such as the _bulk_docs URL and a _design/  
index page, it would also make good documentation if it was an HTML  
page. The name of the anchor or the rel attribute could serve to  
indicate link functions.

   <a href='_bulk_docs' rel='bulkDocumentsRPC'>Bulk Document  
Operations</a>
   <a href='_design/'>Design Document Index</a>

I'm guessing that it would be wrong to annotate the _design link with  
a rel because a document isn't a design document by virtue of it's  
URL, and the _design/ URL is really just a view. This suggests to me  
that maybe _design/ shouldn't be hard-coded, but should be just  
another view defined using the existing mechanism e.g. _view/_design.  
This touches on the recent discussion about design docs being passed  
to views.

IMO the reference docs on the Wiki really belong with the code, and an  
obvious feature would be to serve those documents from the server.

I wonder if this idea conforms to this requirement:

"A REST API should spend almost all of its descriptive effort in  
defining the media type(s) used for representing resources and driving  
application state, or in defining extended relation names and/or  
hypertext-enabled mark-up for existing standard media types."

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Borrow money from pessimists - they don't expect it back.
   -- Steven Wright

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 15/11/2008, at 4:10 AM, Chris Anderson wrote:

> But I've been thinking about this as well. If we were to attack this
> problem of "High REST" head-on, I think the appropriate course would
> be to define a media type application/couch+json or something. The
> media type's definition would explain how to get from "id" params to
> document URIs, etc.

I think some equivalent to base-uri would be needed to avoid the  
definition of the media type including requirements on the URL  
structure of the server. Either that of the document would include the  
resource URL.

A landing page with URLs for the design documents would also be  
needed. View definitions would need a unique media type because  
currently their meaning is dependent on their location. But maybe I'm  
misunderstanding REST. So easy.

> Doing that is all it would take to be (mostly)
> RESTful. I think the existence of _bulk_docs POST doesn't break
> RESTfulness, either. There's no law that says a system can't define
> RESTful resources alongside RPC endpoints.

Agreed, but IMO Couch shouldn't claim to be at all RESTful if it  
doesn't meet the criteria. It might be REST-like. If some parts are  
RESTful and others not, then the claim should be that it includes some  
RESTful interfaces.

It might seem nitpicking, but the definition of REST is voided when  
things claim to be RESTful that in fact aren't, and it's rarely used  
correctly. I'm not even sure what conformance looks like.

> I'm not sure how meditating on the Zen of REST will help us get
> json-diffs right, but it sure can't hurt.

Sorry, I distracted the discussion when I mentioned _bulk_docs.

I would start working on an implementation of the apply-end of my diff  
proposal, but I don't want to waste time if the powers-that-be don't  
think it's the right way to go.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Some defeats are instalments to victory.
   -- Jacob Riis

Re: Document Updates

Posted by Chris Anderson <jc...@apache.org>.

On Fri, Nov 14, 2008 at 4:22 AM, Antony Blakey <an...@gmail.com> wrote:
> I don't think Couch is truly REST. Certainly _bulk_docs isn't. The fact that
> there are URI patterns means it's not REST, at least not if I've understood
> Roy's recent communications/frustrations, such as
> http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven.
>
> In particular, point 4 seems to disqualify any system, (including Couch)
> that needs the documents in the "Reference" section of the Wiki.
>
> To be REST it has to be just like the web. Using links discovered from
> documents, never constructing them according to some scheme.
>

Ah, shaving the yak shed.

But I've been thinking about this as well. If we were to attack this
problem of "High REST" head-on, I think the appropriate course would
be to define a media type application/couch+json or something. The
media type's definition would explain how to get from "id" params to
document URIs, etc. Doing that is all it would take to be (mostly)
RESTful. I think the existence of _bulk_docs POST doesn't break
RESTfulness, either. There's no law that says a system can't define
RESTful resources alongside RPC endpoints.

I'm not sure how meditating on the Zen of REST will help us get
json-diffs right, but it sure can't hurt.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Fri, Nov 14, 2008 at 12:35:45PM +0000, Noah Slater wrote:
> Sure, there are some areas, such as hypertext as the engine of application
> state, that CouchDB does not use, but looking back at Roy's original doctoral
> thesis, REST seems to be predominantly about architecture constraints, of
> which this was not one of them. CouchDB embraces all of the mentioned
> constraints in some way or another; namely client/server, statelessness,
> cacheability, uniform interfaces, and layered systems.

Careless wording on my part, hypertext is clearly an architectural
constraint. However, the weight placed on it within his thesis does not seem to
be as great as some of the other core constraints.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Fri, Nov 14, 2008 at 10:52:35PM +1030, Antony Blakey wrote:
>> I am eager to be corrected by any resident RESTafarians. For me, REST is a
>> bit like Zen. Sometimes I think I understand it totally, and other times I'm
>> convinced that I don't understand it at all.
>
> I don't think Couch is truly REST. Certainly _bulk_docs isn't. The fact that
> there are URI patterns means it's not REST, at least not if I've understood
> Roy's recent communications/frustrations, such as
> http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven.
>
> In particular, point 4 seems to disqualify any system, (including Couch) that
> needs the documents in the "Reference" section of the Wiki.
>
> To be REST it has to be just like the web. Using links discovered from
> documents, never constructing them according to some scheme.
>
> However, what does it matter? REST certainly is a slippery sucker, but that
> may be because we want it to be more generally applicable than it is. Couch
> doesn't have to be REST, and I suspect that it in fact cannot be.

Sure, there are some areas, such as hypertext as the engine of application
state, that CouchDB does not use, but looking back at Roy's original doctoral
thesis, REST seems to be predominantly about architecture constraints, of which
this was not one of them. CouchDB embraces all of the mentioned constraints in
some way or another; namely client/server, statelessness, cacheability, uniform
interfaces, and layered systems.

So, I guess RESTful or non-RESTful is a false dichotomy in this respect.

Additionally, I agree with you on the state of current bulk operations. I think
there is room for improvement, and hopefully some kind of differential update
could be possible at the same time.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 10:23 PM, Noah Slater wrote:

> On Fri, Nov 14, 2008 at 05:35:03PM +1030, Antony Blakey wrote:
>> I'm not tunneling verbs, I'm just re-using the names of the methods  
>> that would
>> normally be used as selectors. I wasn't implying anything more than  
>> that.
> ...
>> You delete a document using the DELETE verb, yet in a bulk  
>> operation you set
>> the "_deleted" special attribute. That is in effect tunneling the  
>> DELETE,
>> using a different representation, within a POST.
>
> A RESTful system should work by exchanging representations of  
> resources.
>
> As best I understand it, if you want to modify a resource in a way  
> that is not a
> direct update, move or delete you should use a separate media type,  
> something
> like application/diff+json if it existed. A JSON diff could include  
> ways to
> delete and update multiple documents at the same time, a bit like  
> UNIX diff is
> able to specify filenames. This could be used for single or bulk  
> updates.

Yes, a content-type was something I suggested, but it didn't seem  
right. In a strictly RESTful sense, maybe it does make sense however.

> Of course, this feels very similar to your original proposal, which  
> leaves me a
> little confused. Throwing about JSON with keys such as "POST" and  
> "DELETE" feels
> very RPC-like. Perhaps the difference is the use of a separate media  
> type.

These items, such as 'post' and 'delete' can be equated to the  
'replace'/'insert' et al of my diff proposal, but operating over  
documents rather than JSON trees. The fact that they are so named was  
*purely* an attempt on my part to make it obvious what equivalent  
(singular) resource operation (identified by HTTP method) was  
equivalent to that document-level operation, and was in no way an  
attempt to tunnel the HTTP mechanism.

The current way _bulk_docs does deletion doesn't feel right. I do  
think there should be some isomorphism between the _bulk_docs  
structure and the operations one would do without using the _bulk_docs  
mechanism, hence my suggestion (but the second temporal ordering, not  
the first operation-type ordering).

> I am eager to be corrected by any resident RESTafarians. For me,  
> REST is a bit
> like Zen. Sometimes I think I understand it totally, and other times  
> I'm
> convinced that I don't understand it at all.

I don't think Couch is truly REST. Certainly _bulk_docs isn't. The  
fact that there are URI patterns means it's not REST, at least not if  
I've understood Roy's recent communications/frustrations, such as http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven 
.

In particular, point 4 seems to disqualify any system, (including  
Couch) that needs the documents in the "Reference" section of the Wiki.

To be REST it has to be just like the web. Using links discovered from  
documents, never constructing them according to some scheme.

However, what does it matter? REST certainly is a slippery sucker, but  
that may be because we want it to be more generally applicable than it  
is. Couch doesn't have to be REST, and I suspect that it in fact  
cannot be.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Man will never be free until the last king is strangled with the  
entrails of the last priest.
   -- Denis Diderot

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Fri, Nov 14, 2008 at 05:35:03PM +1030, Antony Blakey wrote:
> I'm not tunneling verbs, I'm just re-using the names of the methods that would
> normally be used as selectors. I wasn't implying anything more than that.
...
> You delete a document using the DELETE verb, yet in a bulk operation you set
> the "_deleted" special attribute. That is in effect tunneling the DELETE,
> using a different representation, within a POST.

A RESTful system should work by exchanging representations of resources.

As best I understand it, if you want to modify a resource in a way that is not a
direct update, move or delete you should use a separate media type, something
like application/diff+json if it existed. A JSON diff could include ways to
delete and update multiple documents at the same time, a bit like UNIX diff is
able to specify filenames. This could be used for single or bulk updates.

Of course, this feels very similar to your original proposal, which leaves me a
little confused. Throwing about JSON with keys such as "POST" and "DELETE" feels
very RPC-like. Perhaps the difference is the use of a separate media type.

I am eager to be corrected by any resident RESTafarians. For me, REST is a bit
like Zen. Sometimes I think I understand it totally, and other times I'm
convinced that I don't understand it at all.

Best,

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 2:32 PM, Noah Slater wrote:

> On Fri, Nov 14, 2008 at 12:32:18PM +1030, Antony Blakey wrote:
>> You would want to allow partial updates in a bulk operation, so any
>> packaging would need to be usable in that context as well. Given  
>> updates
>> need to be handled separately, maybe deletions should be as well.
> ...
>>  "PUT": [
> ...
>>  "PATCH": [
> ...
>>  "DELETE": [
>
> We shouldn't be tunneling verbs though media types, this is  
> antithetical to the
> principals of REST and would harm all manner of possible  
> intermediary clients.

I'm not tunneling verbs, I'm just re-using the names of the methods  
that would normally be used as selectors. I wasn't implying anything  
more than that.

Couch's bulk operation already has this issue. You delete a document  
using the DELETE verb, yet in a bulk operation you set the "_deleted"  
special attribute. That is in effect tunneling the DELETE, using a  
different representation, within a POST.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

There are two ways of constructing a software design: One way is to  
make it so simple that there are obviously no deficiencies, and the  
other way is to make it so complicated that there are no obvious  
deficiencies.
   -- C. A. R. Hoare

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Fri, Nov 14, 2008 at 12:32:18PM +1030, Antony Blakey wrote:
> You would want to allow partial updates in a bulk operation, so any
> packaging would need to be usable in that context as well. Given updates
> need to be handled separately, maybe deletions should be as well.
...
>   "PUT": [
...
>   "PATCH": [
...
>   "DELETE": [

We shouldn't be tunneling verbs though media types, this is antithetical to the
principals of REST and would harm all manner of possible intermediary clients.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 12:32 PM, Antony Blakey wrote:

> {
>  "docs": [
>    /* Just for backwards compatibility ... but does that matter for  
> an alpha product ? */
>    ... as now ...
>  ],
>
>  "PUT": [
>    /* As now with docs, but not allowing "delete":true ? */
>    { "_id": ..., "_rev": ..., ... }
>    ...
>  ],
>  "PATCH": [
>    { "_id": ..., "_rev": ..., deltas: [ { "replace":... }, ... ] }
>    ...
>  ],
>  "DELETE": [
>    { "_id": ..., "_rev": ... },
>    ...
>  ]
> }
>
> This has the benefit of (roughly) representing the HTTP methods that  
> it aggregates.

On second thought, given that it represents an aggregation of commands  
that have an explicit ordering, maybe it shouldn't be grouped by  
method but instead use the method as a key. Like this:

[
   { "delete":{ "_id": ..., "_rev": ... } },
   { "put": { "_id": ..., "_rev": ..., ... },
   { "patch": { "_id": ..., "_rev": ... } "with": [ { "replace": ...  
"with": ... }, ... ] },
   ...
]

The benefit of this that generating this is easier to reason about and  
generate if your client code is doing deletes and inserts of documents  
with the same id. It accurately represents adding a transactional  
boundary without requiring a change in semantics.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

When I hear somebody sigh, 'Life is hard,' I am always tempted to ask,  
'Compared to what?'
   -- Sydney Harris

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 10:07 AM, Noah Slater wrote:

> I think I agree that if this was to hit CouchDB it should be done  
> via the PATCH
> method, makes the most sense given the context.

You would want to allow partial updates in a bulk operation, so any  
packaging would need to be usable in that context as well. Given  
updates need to be handled separately, maybe deletions should be as  
well.

{
   "docs": [
     /* Just for backwards compatibility ... but does that matter for  
an alpha product ? */
     ... as now ...
   ],

   "PUT": [
     /* As now with docs, but not allowing "delete":true ? */
     { "_id": ..., "_rev": ..., ... }
     ...
   ],
   "PATCH": [
     { "_id": ..., "_rev": ..., deltas: [ { "replace":... }, ... ] }
     ...
   ],
   "DELETE": [
     { "_id": ..., "_rev": ... },
     ...
   ]
}

This has the benefit of (roughly) representing the HTTP methods that  
it aggregates.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

It is no measure of health to be well adjusted to a profoundly sick  
society.
   -- Jiddu Krishnamurti

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 10:07 AM, Noah Slater wrote:

> I did some digging to see what else is out there:
>
> * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
> * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial- 
> updates/
> * http://www.snellspace.com/wp/?p=895
> * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/ 
> 0316.html
> * http://www.snellspace.com/wp/?p=902
>
> I think I agree that if this was to hit CouchDB it should be done  
> via the PATCH
> method, makes the most sense given the context.

My only concern with PATCH is that it isn't HTTP 1.1. When I was  
working with WebDAV we continually had problems with proxies that  
didn't deal with non-HTTP-1.1 methods.

IMO the only other alternative is a POST with either a content-type,  
although I feel uneasy about that if the content actually *is* JSON,  
or a query parameter.

Alternatively you could use a POST to an extended URL, but that  
interferes with attachments. And as I understand it, that would only  
truly qualify as REST if it was included in a document e.g.  
discoverable rather than specified.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Did you hear about the Buddhist who refused Novocain during a root  
canal?
His goal: transcend dental medication.

Re: Document Updates

Posted by Adam Kocoloski <ad...@gmail.com>.

Partial updates are also a very popular discussion topic on the  
restful-json Google group:

http://groups.google.com/group/restful-json

But AFAIK no consensus has been reached yet.  Best, Adam

On Nov 13, 2008, at 6:37 PM, Noah Slater wrote:

> On Fri, Nov 14, 2008 at 09:09:03AM +1030, Antony Blakey wrote:
>> IMO the simplest thing that would work (ignoring representation)  
>> looks
>> something like this:
>>
>>  insert <json> in <jsonpath>
>>  insert <json> after <jsonpath>
>>  insert <json> before <jsonpath>
>>  delete <jsonpath>
>>  replace <jsonpath> with <json>
>>
>> where jsonpath is roughly as: http://goessner.net/articles/JsonPath/
>> without the executable expressions.
>
> Hmm, this seems pretty cool.
>
> I did some digging to see what else is out there:
>
> * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
> * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial- 
> updates/
> * http://www.snellspace.com/wp/?p=895
> * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/ 
> 0316.html
> * http://www.snellspace.com/wp/?p=902
>
> I think I agree that if this was to hit CouchDB it should be done  
> via the PATCH
> method, makes the most sense given the context.
>
> -- 
> Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Fri, Nov 14, 2008 at 09:09:03AM +1030, Antony Blakey wrote:
> IMO the simplest thing that would work (ignoring representation) looks
> something like this:
>
>   insert <json> in <jsonpath>
>   insert <json> after <jsonpath>
>   insert <json> before <jsonpath>
>   delete <jsonpath>
>   replace <jsonpath> with <json>
>
> where jsonpath is roughly as: http://goessner.net/articles/JsonPath/
> without the executable expressions.

Hmm, this seems pretty cool.

I did some digging to see what else is out there:

 * http://intertwingly.net/blog/2008/02/21/APP-Level-Patch
 * http://blog.mozilla.com/rob-sayre/2008/02/15/restful-partial-updates/
 * http://www.snellspace.com/wp/?p=895
 * http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0316.html
 * http://www.snellspace.com/wp/?p=902

I think I agree that if this was to hit CouchDB it should be done via the PATCH
method, makes the most sense given the context.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 8:33 AM, Noah Slater wrote:

> On Fri, Nov 14, 2008 at 08:19:22AM +1030, Antony Blakey wrote:
>> The relevent section from XQuery Update,
>> http://www.w3.org/TR/xquery-update-10/#id-update-primitives, might  
>> be useful
>> starting point for defining a JSON-encoded (recursive) EDL-based  
>> structural
>> diff.
>
> I think an XQuery/XPath type solution would be very interesting.

IMO the simplest thing that would work (ignoring representation) looks  
something like this:

   insert <json> in <jsonpath>
   insert <json> after <jsonpath>
   insert <json> before <jsonpath>
   delete <jsonpath>
   replace <jsonpath> with <json>

where jsonpath is roughly as: http://goessner.net/articles/JsonPath/  
without the executable expressions.

Diff computation would undoubtedly generate a restricted subset of  
jsonpath selectors, but it's worth supporting the wildcard/recursive  
descent operations for clients.

Representing the update document as json itself would be clean, so an  
EDL could look like this:

[
   { "replace":"$.post.comments[2].email"  
"with":"antony@linkuistics.com" },
   { "insert": { "email":.... } "in": "$.post.comments" }
   { "insert": { "email":.... } "after": "$.post.comments[5]" }

]

or, using a meta-encoding (which IMO is unneccessary)

[
   { "op":"replace" "path":"$.post.comments[2].email" "content":"antony@linkuistics.com 
" },
   { "op":"insert-in" "path":"$.post.comments" "content":  
{ "email":.... }  }
   { "op":"insert-after" "path":"$.post.comments[5]" "content":  
{ "email":.... }  }

]

I propose that these aren't declarative, but procedural, in the sense  
that they are applied linearly and hence each path context is the  
result of the proceeding edits, rather than the original tree. This  
complicates the encoding of diffs but results in a much simpler apply  
mechanism. But maybe it would be worth using a declarative form with a  
constant context - I'm unsure about the tradeoffs.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

He who would make his own liberty secure, must guard even his enemy  
from repression.
   -- Thomas Paine

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Fri, Nov 14, 2008 at 08:19:22AM +1030, Antony Blakey wrote:
> The relevent section from XQuery Update,
> http://www.w3.org/TR/xquery-update-10/#id-update-primitives, might be useful
> starting point for defining a JSON-encoded (recursive) EDL-based structural
> diff.

I think an XQuery/XPath type solution would be very interesting.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 3:13 AM, Noah Slater wrote:

> On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:
>> I think that this should be pretty easily done using:
>> a) well defined pretty format output
>> b) standard diff
>>
>> The reason for (a) is that you need this to get line breaks, which  
>> are
>> critical to diffing correctly.
>
> It's a bit more complex than that, canonicalised JSON is still in  
> it's infancy,
> so we would have to get the community to adopt that first. I know  
> that people
> have been discussing JSON diffs before, may be worth looking up  
> what's already
> been done on this.

Given the XML/JSON isomorphism, I wonder if something like this: http://www.springerlink.com/content/r1t6h8631868k615/ 
  would be a good start for computing a diff.

The relevent section from XQuery Update, http://www.w3.org/TR/xquery-update-10/#id-update-primitives 
, might be useful starting point for defining a JSON-encoded  
(recursive) EDL-based structural diff.

IME a structural diff is better for these purposes than a traditional  
text-diff over a canonicalized format. It's certainly easier to  
generate for clients.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

On the other side, you have the customer and/or user, and they tend to  
do what we call "automating the pain." They say, "What is it we're  
doing now? How would that look if we automated it?" Whereas, what the  
design process should properly be is one of saying, "What are the  
goals we're trying to accomplish and how can we get rid of all this  
task crap?"
   -- Alan Cooper

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Thu, Nov 13, 2008 at 06:40:44PM +0200, Ayende Rahien wrote:
> I think that this should be pretty easily done using:
> a) well defined pretty format output
> b) standard diff
>
> The reason for (a) is that you need this to get line breaks, which are
> critical to diffing correctly.

It's a bit more complex than that, canonicalised JSON is still in it's infancy,
so we would have to get the community to adopt that first. I know that people
have been discussing JSON diffs before, may be worth looking up what's already
been done on this.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Antony Blakey <an...@gmail.com>.

On 14/11/2008, at 3:30 AM, Damien Katz wrote:

> I was planning on something similar this for field and attachment  
> level replication, where only the fields or attachments that are  
> changed are replicated.

Are you planning on sending attachment deltas e.g. rsync/unison? That  
would be enormously useful for my CouchDB app.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Man may make a Remark –
In itself – a quiet thing
That may furnish the Fuse unto a Spark
In dormant nature – lain –

Let us divide – with skill –
Let us discourse – with care –
Powder exists in Charcoal –
Before it exists in Fire –

   -– Emily Dickinson 913 (1865)

Re: Document Updates

Posted by "ara.t.howard" <ar...@gmail.com>.

On Nov 13, 2008, at 10:14 AM, Michael Ramirez wrote:

> If I begin breaking up my documents into related documents aren't I  
> just creating a relational database?

this is where i find myself too: all the modeling issues with couch  
seem illicit this suggestion.  the issue is that it's quite difficult  
to manipulate multiple docs without facilities like 'select for  
update' and 'begin transaction'.  so far the only approach i've come  
up with, once docs are split out, is to read them all, perform the  
update, and the write them all back.  otherwise any computed value  
risks being based on stale data.

it really does seem strange to me that so many solutions to couch  
involve re-creating relational constructs - like there must be a  
better way....

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Thu, Nov 13, 2008 at 09:14:47AM -0800, Michael Ramirez wrote:
> If I begin breaking up my documents into related documents aren't I just
> creating a relational database?

Well, I don't think Damien was dismissing differential updates.

I do disagree with Damien on his points about root level changes. I think a
generic JSON diff format would be hugely advantageous. Again, this is the kind
of thing that would need to be standardised and baked into JSON client libraries
before it could be used properly though.

I think Damien was pointing out that no matter if you have differential updates,
the size of the documents still effects performance; disk IO, memory and view
calculation all suffer. So, there is a balance to strike between convenience and
performance. It is entirely up to you how that should be addressed per app.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Damien Katz <da...@apache.org>.

Not necessarily relational, it depends on the use case and how much  
you can denormalize. But if the document keeps growing, is it really a  
document, or bunch of documents bound together?

While some XML databases allow documents that are gigabytes or even  
terabytes in size, CouchDB documents are meant to be individually held  
in-memory. And while both operate on documents, the query and access  
models differ greatly. It might be that XML or relational database is  
a better fit for your app.

-Damien


On Nov 13, 2008, at 12:14 PM, Michael Ramirez wrote:

> If I begin breaking up my documents into related documents aren't I  
> just creating a relational database?
>
>
> Michael
>
> ----- Original Message ----
> From: Damien Katz <da...@apache.org>
> To: couchdb-user@incubator.apache.org
> Sent: Thursday, November 13, 2008 10:00:44 AM
> Subject: Re: Document Updates
>
> I was planning on something similar this for field and attachment  
> level replication, where only the fields or attachments that are  
> changed are replicated. With the scheme I'm thinking of, it's  
> possible to have it incremental at any nested level of the doc tree,  
> but I'm not sure the extra overhead is worth doing it beyond the  
> root fields.
>
> However, Michael's concern of the document getting larger and the  
> app getting slower still applies, the document must still be loaded  
> into memory on the server and the diffs applied, and the complete  
> doc will need to be loaded into memory for view indexing too.  
> Michael, regardless of the diff updates, I'm thinking you need to  
> break you document up into multiple documents.
>
> -Damien
>
> On Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:
>
>> I think that this should be pretty easily done using:
>> a) well defined pretty format output
>> b) standard diff
>>
>> The reason for (a) is that you need this to get line breaks, which  
>> are
>> critical to diffing correctly.
>>
>> On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater <ns...@apache.org>  
>> wrote:
>>
>>> On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
>>>> Will this cause bandwidth issues when updating large documents if  
>>>> only a
>>>> single field changes. I am afraid that as my documents grow  
>>>> larger my app
>>> gets
>>>> slower.
>>>
>>> I for one am interested to hear JSON diff proposals. I think this  
>>> would
>>> make a
>>> great addition to CouchDB. As best I can tell, this should really  
>>> be done
>>> as an
>>> external standardisation effort so the whole community could  
>>> benifit. I
>>> don't
>>> think using JavaScript to set the document attributes is a very good
>>> solution to
>>> this. An entirely new Media Type is needed, IMHO.
>>>
>>> --
>>> Noah Slater, http://tumbolia.org/nslater
>>>
>
>
>

Re: Document Updates

Posted by Michael Ramirez <mi...@yahoo.com>.

If I begin breaking up my documents into related documents aren't I just creating a relational database?

Michael

----- Original Message ----
From: Damien Katz <da...@apache.org>
To: couchdb-user@incubator.apache.org
Sent: Thursday, November 13, 2008 10:00:44 AM
Subject: Re: Document Updates

I was planning on something similar this for field and attachment level replication, where only the fields or attachments that are changed are replicated. With the scheme I'm thinking of, it's possible to have it incremental at any nested level of the doc tree, but I'm not sure the extra overhead is worth doing it beyond the root fields.

However, Michael's concern of the document getting larger and the app getting slower still applies, the document must still be loaded into memory on the server and the diffs applied, and the complete doc will need to be loaded into memory for view indexing too. Michael, regardless of the diff updates, I'm thinking you need to break you document up into multiple documents.

-Damien

On Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:

> I think that this should be pretty easily done using:
> a) well defined pretty format output
> b) standard diff
> 
> The reason for (a) is that you need this to get line breaks, which are
> critical to diffing correctly.
> 
> On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater <ns...@apache.org> wrote:
> 
>> On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
>>> Will this cause bandwidth issues when updating large documents if only a
>>> single field changes. I am afraid that as my documents grow larger my app
>> gets
>>> slower.
>> 
>> I for one am interested to hear JSON diff proposals. I think this would
>> make a
>> great addition to CouchDB. As best I can tell, this should really be done
>> as an
>> external standardisation effort so the whole community could benifit. I
>> don't
>> think using JavaScript to set the document attributes is a very good
>> solution to
>> this. An entirely new Media Type is needed, IMHO.
>> 
>> --
>> Noah Slater, http://tumbolia.org/nslater
>>

Re: Document Updates

Posted by Damien Katz <da...@apache.org>.

I was planning on something similar this for field and attachment  
level replication, where only the fields or attachments that are  
changed are replicated. With the scheme I'm thinking of, it's possible  
to have it incremental at any nested level of the doc tree, but I'm  
not sure the extra overhead is worth doing it beyond the root fields.

However, Michael's concern of the document getting larger and the app  
getting slower still applies, the document must still be loaded into  
memory on the server and the diffs applied, and the complete doc will  
need to be loaded into memory for view indexing too. Michael,  
regardless of the diff updates, I'm thinking you need to break you  
document up into multiple documents.

-Damien

On Nov 13, 2008, at 11:40 AM, Ayende Rahien wrote:

> I think that this should be pretty easily done using:
> a) well defined pretty format output
> b) standard diff
>
> The reason for (a) is that you need this to get line breaks, which are
> critical to diffing correctly.
>
> On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater <ns...@apache.org>  
> wrote:
>
>> On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
>>> Will this cause bandwidth issues when updating large documents if  
>>> only a
>>> single field changes. I am afraid that as my documents grow larger  
>>> my app
>> gets
>>> slower.
>>
>> I for one am interested to hear JSON diff proposals. I think this  
>> would
>> make a
>> great addition to CouchDB. As best I can tell, this should really  
>> be done
>> as an
>> external standardisation effort so the whole community could  
>> benifit. I
>> don't
>> think using JavaScript to set the document attributes is a very good
>> solution to
>> this. An entirely new Media Type is needed, IMHO.
>>
>> --
>> Noah Slater, http://tumbolia.org/nslater
>>

Re: Document Updates

Posted by Ayende Rahien <ay...@ayende.com>.

I think that this should be pretty easily done using:
a) well defined pretty format output
b) standard diff

The reason for (a) is that you need this to get line breaks, which are
critical to diffing correctly.

On Thu, Nov 13, 2008 at 6:38 PM, Noah Slater <ns...@apache.org> wrote:

> On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
> > Will this cause bandwidth issues when updating large documents if only a
> > single field changes. I am afraid that as my documents grow larger my app
> gets
> > slower.
>
> I for one am interested to hear JSON diff proposals. I think this would
> make a
> great addition to CouchDB. As best I can tell, this should really be done
> as an
> external standardisation effort so the whole community could benifit. I
> don't
> think using JavaScript to set the document attributes is a very good
> solution to
> this. An entirely new Media Type is needed, IMHO.
>
> --
> Noah Slater, http://tumbolia.org/nslater
>

Re: Document Updates

Posted by Noah Slater <ns...@apache.org>.

On Thu, Nov 13, 2008 at 08:30:17AM -0800, Michael Ramirez wrote:
> Will this cause bandwidth issues when updating large documents if only a
> single field changes. I am afraid that as my documents grow larger my app gets
> slower.

I for one am interested to hear JSON diff proposals. I think this would make a
great addition to CouchDB. As best I can tell, this should really be done as an
external standardisation effort so the whole community could benifit. I don't
think using JavaScript to set the document attributes is a very good solution to
this. An entirely new Media Type is needed, IMHO.

-- 
Noah Slater, http://tumbolia.org/nslater

Re: Document Updates

Posted by Michael Ramirez <mi...@yahoo.com>.

Will this cause bandwidth issues when updating large documents if only a single field changes. I am afraid that as my documents grow larger my app gets slower.

Michael

----- Original Message ----
From: Paul Davis <pa...@gmail.com>
To: couchdb-user@incubator.apache.org
Sent: Thursday, November 13, 2008 9:22:11 AM
Subject: Re: Document Updates

The entire document.

On Thu, Nov 13, 2008 at 11:20 AM, Michael Ramirez
<mi...@yahoo.com> wrote:
> When updating documents must the entire document be resent or just the changed fields?
>
> Michael
>
>
>
>
>

Re: Document Updates

Posted by Paul Davis <pa...@gmail.com>.

The entire document.

On Thu, Nov 13, 2008 at 11:20 AM, Michael Ramirez
<mi...@yahoo.com> wrote:
> When updating documents must the entire document be resent or just the changed fields?
>
> Michael
>
>
>
>
>