You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Manolo Padron Martinez <ma...@gmail.com> on 2009/02/23 15:00:59 UTC

Fail on a simple case on replication

Hi:

I'm trying to test the replication process with two local database and I
found that replication process don't work as it should (or as I think it
would)

The case:

1º Create a db called t2.
2º Create a document called terminator.
3º Add a property to the document, so that makes a new revision, with a
property called speed and the value 1
4º Create a new db called t3.
5º Launch replication process from t2 to t3.

In t3 should be a document with two revisions, and if I point to
"t3/terminator?revs=true" appears two revisions. If I try to get the last
revision it works as it should but If I try to get the first revision (the
one without properties) I get a "not found" message.

In t2 database, this works without problems so I think that is a problem
with replication.

I've tried with debian , with the lastest in the web (0.8.1), and the trunk
svn version with the same results.

Anyone could help me or the terminator will kill me? :-)

Thanks in advance

Manolo Padrón Martínez

Re: Fail on a simple case on replication

Posted by Patrick Antivackis <pa...@gmail.com>.

OK , upadated :
http://wiki.apache.org/couchdb/Document_revisions

Created :  http://wiki.apache.org/couchdb/Replication (need more work)

2009/2/23 Jan Lehnardt <ja...@apache.org>

>
> On 23 Feb 2009, at 16:40, Patrick Antivackis wrote:
>
>  May be i can start a wiki page on replication, but i think the
>> http://couchdb.apache.org/docs/overview.html should be clarified too.
>>
>
> Hey yeah, feel free to add new pages and fi existing ones as you
> see fit, thanks! :)
>
> Cheers
> Jan
> --
> (Still +1 for the rename :)
>
>
>
>>
>>
>> 2009/2/23 Jan Lehnardt <ja...@apache.org>
>>
>>
>>> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>>>
>>> For a reminder :
>>>
>>>>
>>>> revision  (n)
>>>> 1. the act or process of revising,
>>>> 2. a corrected or new version of a book, article, etc.
>>>>
>>>> For me this term is correct with the use in Couch
>>>>
>>>>
>>> Damien is not saying the usage is wrong in CouchDB, but people
>>> associate more with "revision" than he'd like. Hence the proposal.
>>>
>>>
>>> I think a good explanation of what a compaction/replication are doing (ie
>>>
>>>> removing  old rev, or replicating only current rev) is the right
>>>> solution
>>>> to
>>>> this misunderstanding
>>>>
>>>>
>>> Can you suggest how we improve the wiki docs to satisfy this? In my
>>> opinion, the docs are clear* and the term is overloaded and confusing.
>>>
>>> * http://wiki.apache.org/couchdb/Document_revisions has
>>> "You cannot rely on document revisions for any other purpose
>>> than concurrency control." in bold letters.
>>>
>>> I stated this in earlier discussions as well: Even if our documentation
>>> were perfect, we don't control how people learn about CouchDB. We
>>> only control the API and we should work hard to get it right.
>>>
>>> The way it stands now, a lot of people new to CouchDB get it wrong
>>> because "revision" is a familiar term and they associate the behaviour
>>> they associate with it to them. That's how humans learn. In this case
>>> we make the learning hard.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>>
>>> - Remove the ability to get old revisions
>>>
>>>>
>>>>
>>>>>
>>>>>>  -1 : This functionnality is interesting for some case studies
>>>>>
>>>>
>>>> - Make it much harder/verbose to get old revision
>>>>
>>>>
>>>> -1 : I don't see the utility of this
>>>>
>>>> - Make the api to get old revisions something like
>>>>
>>>>  "?old_rev_that_might_still_be_on_disk=...."
>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> 0 :
>>>>>
>>>>
>>>>
>>>> - Don't call them revisions, call them "turd blossoms" or "hobo socks".
>>>>
>>>>>
>>>>>  People won't know what they are, but at least they won't misuse them.
>>>>>>
>>>>>>
>>>>>>  -1 : revision seems the right term to me
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> -Damien
>>>>>
>>>>>>
>>>>>> Begin forwarded message:
>>>>>>
>>>>>> From: Damien Katz <da...@apache.org>
>>>>>>
>>>>>>  Date: February 23, 2009 9:09:09 AM EST
>>>>>>> To: user@couchdb.apache.org
>>>>>>> Subject: Re: Fail on a simple case on replication
>>>>>>> Reply-To: user@couchdb.apache.org
>>>>>>>
>>>>>>> Revisions are made available as a convenience, but CouchDB doesn't
>>>>>>> replicate old revisions, only the most recent. Also compaction will
>>>>>>> remove
>>>>>>> old revisions as well.
>>>>>>>
>>>>>>> -Damien
>>>>>>>
>>>>>>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>>
>>>>>>>> I'm trying to test the replication process with two local database
>>>>>>>> and
>>>>>>>> I
>>>>>>>> found that replication process don't work as it should (or as I
>>>>>>>> think
>>>>>>>> it
>>>>>>>> would)
>>>>>>>>
>>>>>>>> The case:
>>>>>>>>
>>>>>>>> 1º Create a db called t2.
>>>>>>>> 2º Create a document called terminator.
>>>>>>>> 3º Add a property to the document, so that makes a new revision,
>>>>>>>> with
>>>>>>>> a
>>>>>>>> property called speed and the value 1
>>>>>>>> 4º Create a new db called t3.
>>>>>>>> 5º Launch replication process from t2 to t3.
>>>>>>>>
>>>>>>>> In t3 should be a document with two revisions, and if I point to
>>>>>>>> "t3/terminator?revs=true" appears two revisions. If I try to get the
>>>>>>>> last
>>>>>>>> revision it works as it should but If I try to get the first
>>>>>>>> revision
>>>>>>>> (the
>>>>>>>> one without properties) I get a "not found" message.
>>>>>>>>
>>>>>>>> In t2 database, this works without problems so I think that is a
>>>>>>>> problem
>>>>>>>> with replication.
>>>>>>>>
>>>>>>>> I've tried with debian , with the lastest in the web (0.8.1), and
>>>>>>>> the
>>>>>>>> trunk
>>>>>>>> svn version with the same results.
>>>>>>>>
>>>>>>>> Anyone could help me or the terminator will kill me? :-)
>>>>>>>>
>>>>>>>> Thanks in advance
>>>>>>>>
>>>>>>>> Manolo Padrón Martínez
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

On 23 Feb 2009, at 16:40, Patrick Antivackis wrote:

> May be i can start a wiki page on replication, but i think the
> http://couchdb.apache.org/docs/overview.html should be clarified too.

Hey yeah, feel free to add new pages and fi existing ones as you
see fit, thanks! :)

Cheers
Jan
--
(Still +1 for the rename :)

>
>
>
> 2009/2/23 Jan Lehnardt <ja...@apache.org>
>
>>
>> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>>
>> For a reminder :
>>>
>>> revision  (n)
>>> 1. the act or process of revising,
>>> 2. a corrected or new version of a book, article, etc.
>>>
>>> For me this term is correct with the use in Couch
>>>
>>
>> Damien is not saying the usage is wrong in CouchDB, but people
>> associate more with "revision" than he'd like. Hence the proposal.
>>
>>
>> I think a good explanation of what a compaction/replication are  
>> doing (ie
>>> removing  old rev, or replicating only current rev) is the right  
>>> solution
>>> to
>>> this misunderstanding
>>>
>>
>> Can you suggest how we improve the wiki docs to satisfy this? In my
>> opinion, the docs are clear* and the term is overloaded and  
>> confusing.
>>
>> * http://wiki.apache.org/couchdb/Document_revisions has
>> "You cannot rely on document revisions for any other purpose
>> than concurrency control." in bold letters.
>>
>> I stated this in earlier discussions as well: Even if our  
>> documentation
>> were perfect, we don't control how people learn about CouchDB. We
>> only control the API and we should work hard to get it right.
>>
>> The way it stands now, a lot of people new to CouchDB get it wrong
>> because "revision" is a familiar term and they associate the  
>> behaviour
>> they associate with it to them. That's how humans learn. In this case
>> we make the learning hard.
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>> - Remove the ability to get old revisions
>>>
>>>>
>>>>>
>>>> -1 : This functionnality is interesting for some case studies
>>>
>>> - Make it much harder/verbose to get old revision
>>>
>>>
>>> -1 : I don't see the utility of this
>>>
>>> - Make the api to get old revisions something like
>>>
>>>> "?old_rev_that_might_still_be_on_disk=...."
>>>>>
>>>>
>>>>
>>>> 0 :
>>>
>>>
>>> - Don't call them revisions, call them "turd blossoms" or "hobo  
>>> socks".
>>>>
>>>>> People won't know what they are, but at least they won't misuse  
>>>>> them.
>>>>>
>>>>>
>>>> -1 : revision seems the right term to me
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>> -Damien
>>>>>
>>>>> Begin forwarded message:
>>>>>
>>>>> From: Damien Katz <da...@apache.org>
>>>>>
>>>>>> Date: February 23, 2009 9:09:09 AM EST
>>>>>> To: user@couchdb.apache.org
>>>>>> Subject: Re: Fail on a simple case on replication
>>>>>> Reply-To: user@couchdb.apache.org
>>>>>>
>>>>>> Revisions are made available as a convenience, but CouchDB  
>>>>>> doesn't
>>>>>> replicate old revisions, only the most recent. Also compaction  
>>>>>> will
>>>>>> remove
>>>>>> old revisions as well.
>>>>>>
>>>>>> -Damien
>>>>>>
>>>>>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>>>>>
>>>>>> Hi:
>>>>>>
>>>>>>>
>>>>>>> I'm trying to test the replication process with two local  
>>>>>>> database and
>>>>>>> I
>>>>>>> found that replication process don't work as it should (or as  
>>>>>>> I think
>>>>>>> it
>>>>>>> would)
>>>>>>>
>>>>>>> The case:
>>>>>>>
>>>>>>> 1º Create a db called t2.
>>>>>>> 2º Create a document called terminator.
>>>>>>> 3º Add a property to the document, so that makes a new  
>>>>>>> revision, with
>>>>>>> a
>>>>>>> property called speed and the value 1
>>>>>>> 4º Create a new db called t3.
>>>>>>> 5º Launch replication process from t2 to t3.
>>>>>>>
>>>>>>> In t3 should be a document with two revisions, and if I point to
>>>>>>> "t3/terminator?revs=true" appears two revisions. If I try to  
>>>>>>> get the
>>>>>>> last
>>>>>>> revision it works as it should but If I try to get the first  
>>>>>>> revision
>>>>>>> (the
>>>>>>> one without properties) I get a "not found" message.
>>>>>>>
>>>>>>> In t2 database, this works without problems so I think that is a
>>>>>>> problem
>>>>>>> with replication.
>>>>>>>
>>>>>>> I've tried with debian , with the lastest in the web (0.8.1),  
>>>>>>> and the
>>>>>>> trunk
>>>>>>> svn version with the same results.
>>>>>>>
>>>>>>> Anyone could help me or the terminator will kill me? :-)
>>>>>>>
>>>>>>> Thanks in advance
>>>>>>>
>>>>>>> Manolo Padrón Martínez
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: Fail on a simple case on replication

Posted by Patrick Antivackis <pa...@gmail.com>.

Hi Jan,
You are right about the document revision wiki page, unfortunately there is
nothing about replication (at least i not found).
The only thing on replication I found is :
http://couchdb.apache.org/docs/overview.html
where it says :
"The replication process is incremental. At the database level, replication
only examines documents updated since the last replication. Then for each
updated document, only fields and blobs that have changed are replicated
across the network. If replication fails at any step, due to network
problems or crash for example, the next replication restarts at the same
document where it left off."

I wrote something like this in the "proposed replication rev history
changes" mail list thread.

version of a document : a version of a document is identified by an id and a
revision. The version of the document contains all the fileds/values as they
were at this specific revision

revision history : the list of all the revisions of the document beginning
by the most recent

Compaction of a database removes all previous versions of a document but
keep the revision history, so a revs=true for a document will return the
revision history of this document but of course I will not be able to see
what contains this document revision

Replication only replicate the last version of a document. if I replicate
baseA to an empty baseB, baseB will contains the last version of all non
deleted documents, and for each documents the full revision history. So for
each document i'm able to do a revs=true, but I am unable to see the content
of a previous version.

May be i can start a wiki page on replication, but i think the
http://couchdb.apache.org/docs/overview.html should be clarified too.


2009/2/23 Jan Lehnardt <ja...@apache.org>

>
> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>
>  For a reminder :
>>
>> revision  (n)
>> 1. the act or process of revising,
>> 2. a corrected or new version of a book, article, etc.
>>
>> For me this term is correct with the use in Couch
>>
>
> Damien is not saying the usage is wrong in CouchDB, but people
> associate more with "revision" than he'd like. Hence the proposal.
>
>
>  I think a good explanation of what a compaction/replication are doing (ie
>> removing  old rev, or replicating only current rev) is the right solution
>> to
>> this misunderstanding
>>
>
> Can you suggest how we improve the wiki docs to satisfy this? In my
> opinion, the docs are clear* and the term is overloaded and confusing.
>
> * http://wiki.apache.org/couchdb/Document_revisions has
> "You cannot rely on document revisions for any other purpose
> than concurrency control." in bold letters.
>
> I stated this in earlier discussions as well: Even if our documentation
> were perfect, we don't control how people learn about CouchDB. We
> only control the API and we should work hard to get it right.
>
> The way it stands now, a lot of people new to CouchDB get it wrong
> because "revision" is a familiar term and they associate the behaviour
> they associate with it to them. That's how humans learn. In this case
> we make the learning hard.
>
> Cheers
> Jan
> --
>
>
>
>  - Remove the ability to get old revisions
>>
>>>
>>>>
>>>  -1 : This functionnality is interesting for some case studies
>>
>> - Make it much harder/verbose to get old revision
>>
>>
>> -1 : I don't see the utility of this
>>
>> - Make the api to get old revisions something like
>>
>>> "?old_rev_that_might_still_be_on_disk=...."
>>>>
>>>
>>>
>>>  0 :
>>
>>
>>  - Don't call them revisions, call them "turd blossoms" or "hobo socks".
>>>
>>>> People won't know what they are, but at least they won't misuse them.
>>>>
>>>>
>>>  -1 : revision seems the right term to me
>>
>>
>>
>>>
>>>
>>>
>>>  -Damien
>>>>
>>>> Begin forwarded message:
>>>>
>>>> From: Damien Katz <da...@apache.org>
>>>>
>>>>> Date: February 23, 2009 9:09:09 AM EST
>>>>> To: user@couchdb.apache.org
>>>>> Subject: Re: Fail on a simple case on replication
>>>>> Reply-To: user@couchdb.apache.org
>>>>>
>>>>> Revisions are made available as a convenience, but CouchDB doesn't
>>>>> replicate old revisions, only the most recent. Also compaction will
>>>>> remove
>>>>> old revisions as well.
>>>>>
>>>>> -Damien
>>>>>
>>>>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>>>>
>>>>> Hi:
>>>>>
>>>>>>
>>>>>> I'm trying to test the replication process with two local database and
>>>>>> I
>>>>>> found that replication process don't work as it should (or as I think
>>>>>> it
>>>>>> would)
>>>>>>
>>>>>> The case:
>>>>>>
>>>>>> 1º Create a db called t2.
>>>>>> 2º Create a document called terminator.
>>>>>> 3º Add a property to the document, so that makes a new revision, with
>>>>>> a
>>>>>> property called speed and the value 1
>>>>>> 4º Create a new db called t3.
>>>>>> 5º Launch replication process from t2 to t3.
>>>>>>
>>>>>> In t3 should be a document with two revisions, and if I point to
>>>>>> "t3/terminator?revs=true" appears two revisions. If I try to get the
>>>>>> last
>>>>>> revision it works as it should but If I try to get the first revision
>>>>>> (the
>>>>>> one without properties) I get a "not found" message.
>>>>>>
>>>>>> In t2 database, this works without problems so I think that is a
>>>>>> problem
>>>>>> with replication.
>>>>>>
>>>>>> I've tried with debian , with the lastest in the web (0.8.1), and the
>>>>>> trunk
>>>>>> svn version with the same results.
>>>>>>
>>>>>> Anyone could help me or the terminator will kill me? :-)
>>>>>>
>>>>>> Thanks in advance
>>>>>>
>>>>>> Manolo Padrón Martínez
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 25/02/2009, at 2:55 PM, Chris Anderson wrote:

> Reiterating: I think the clean solution is to remove the API for
> loading docs at a particular rev. Instead we allow only the loading of
> all conflicted revs (or of course the HEAD rev). I'll wait for people
> to say why this is a bad idea before I say why it's a good idea.

Also, without access to the common ancestor of a document and it's  
conflicts, you can't use three way merging as a conflict resolution  
strategy, because you only have instantaneous state. Or am I wrong to  
think this is possible in any case? Can you chase down a conflict's  
ancestry to find the divergent point?

However, both this point and my previous point are moot, because the  
replication model means that access to arbitrary previous revisions is  
only likely on the node where the revisions were written. Access to  
only the head and it's conflicts (and the conflict chain I presume) is  
all that is consistent with replication.

I'm wondering therefore if lazy updating externals that respect the  
request's update_seq are not in fact possible given replication?

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Borrow money from pessimists - they don't expect it back.
   -- Steven Wright

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 25/02/2009, at 2:55 PM, Chris Anderson wrote:

> Reiterating: I think the clean solution is to remove the API for
> loading docs at a particular rev. Instead we allow only the loading of
> all conflicted revs (or of course the HEAD rev). I'll wait for people
> to say why this is a bad idea before I say why it's a good idea.

So ... ?

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A reasonable man adapts himself to suit his environment. An  
unreasonable man persists in attempting to adapt his environment to  
suit himself. Therefore, all progress depends on the unreasonable man.
   -- George Bernard Shaw

Re: Fail on a simple case on replication

Posted by Chris Anderson <jc...@apache.org>.

On Tue, Feb 24, 2009 at 8:45 PM, Antony Blakey <an...@gmail.com> wrote:
>
> On 25/02/2009, at 2:55 PM, Chris Anderson wrote:
>
>> Reiterating: I think the clean solution is to remove the API for
>> loading docs at a particular rev. Instead we allow only the loading of
>> all conflicted revs (or of course the HEAD rev). I'll wait for people
>> to say why this is a bad idea before I say why it's a good idea.
>
> It might be a problem for externals that:
>
> a) want to use all_docs_by_seq as a lazy update mechanism without including
> the docs. I can't immediately think why you'd want to do that, but this
> would make it impossible.
>
> b) want to use the conflict data that is consistent with a given MVCC
> snapshot (e.g. the request's update_seq), for which they could theoretically
> need the data from a conflict that is no longer a head conflict.
>
> Edge cases admittedly, but disallowing access to previous revisions would
> force all queries to be dealing with the head, which isn't the case for lazy
> externals in particular.

These are good arguments for maintaining the functionality, but having
it "off" by default. If you need it for an external, you could turn it
on, for the nodes that use that external.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 25/02/2009, at 2:55 PM, Chris Anderson wrote:

> Reiterating: I think the clean solution is to remove the API for
> loading docs at a particular rev. Instead we allow only the loading of
> all conflicted revs (or of course the HEAD rev). I'll wait for people
> to say why this is a bad idea before I say why it's a good idea.

It might be a problem for externals that:

a) want to use all_docs_by_seq as a lazy update mechanism without  
including the docs. I can't immediately think why you'd want to do  
that, but this would make it impossible.

b) want to use the conflict data that is consistent with a given MVCC  
snapshot (e.g. the request's update_seq), for which they could  
theoretically need the data from a conflict that is no longer a head  
conflict.

Edge cases admittedly, but disallowing access to previous revisions  
would force all queries to be dealing with the head, which isn't the  
case for lazy externals in particular.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Did you hear about the Buddhist who refused Novocain during a root  
canal?
His goal: transcend dental medication.

Re: Fail on a simple case on replication

Posted by Chris Anderson <jc...@apache.org>.

On Tue, Feb 24, 2009 at 9:52 AM, Chris Anderson <jc...@apache.org> wrote:
> On another note, I was thinking about it some more, and I think that
> renaming _rev to _cc would be a huge pain in the ass for a lot of
> people (who don't go around abusing it) and it can probably be
> avoided.
>
> The only valid use case for requesting a particular _rev of a
> document, is in resolving conflicts introduced by replication. So if
> we restrict access to old revs (by default) to an endpoint which gives
> an array of documents (each conflicted rev) then it won't be usable as
> a revision control system, only as a conflict resolution system. If
> there's not an easy way to think you have implemented a version
> control system (eg no API endpoint for accessing non-conflicting revs)
> I bet we'll see misapprehension of _rev happen a lot less.
>

Trying to get this thread back on track about an actual small concrete
change to the code we could make that might keep people from trying to
use _rev as a versioning system.

Reiterating: I think the clean solution is to remove the API for
loading docs at a particular rev. Instead we allow only the loading of
all conflicted revs (or of course the HEAD rev). I'll wait for people
to say why this is a bad idea before I say why it's a good idea.

Cheers,
Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Fail on a simple case on replication

Posted by Chris Anderson <jc...@apache.org>.

On Tue, Feb 24, 2009 at 3:50 AM, Damien Katz <da...@apache.org> wrote:
>
> With Chris Andersons's "show" document and "list" view work, we have the
> beginnings of that.
>

I was just going to reply with this point. The only thing I see as
missing to make CouchDB fully "RESTful" is hypermedia. When the
representational states are linked together in a way that can be
browsed, then we've got "high REST". Show and list do that, but they
require developer work. I like to call CouchDB RESTy, and with a
little bit of dev work, it can be RESTful.

This is exactly the slide that comes before my explanation of show and
list, in my latest talk notes: "But that's not REST!"

On another note, I was thinking about it some more, and I think that
renaming _rev to _cc would be a huge pain in the ass for a lot of
people (who don't go around abusing it) and it can probably be
avoided.

The only valid use case for requesting a particular _rev of a
document, is in resolving conflicts introduced by replication. So if
we restrict access to old revs (by default) to an endpoint which gives
an array of documents (each conflicted rev) then it won't be usable as
a revision control system, only as a conflict resolution system. If
there's not an easy way to think you have implemented a version
control system (eg no API endpoint for accessing non-conflicting revs)
I bet we'll see misapprehension of _rev happen a lot less.

Chris

Stealing from Damien's blog header for my one-time special sig:

EVERYBODY KEEPS ON TALKING ABOUT IT
NOBODY'S GETTING IT DONE

(applies to myself as well)

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Fail on a simple case on replication

Posted by Damien Katz <da...@apache.org>.

On Feb 24, 2009, at 6:26 AM, Antony Blakey wrote:

>
> On 24/02/2009, at 9:29 PM, Jan Lehnardt wrote:
>
>> CouchDB documents are limited to JSON (application/json) as the
>> content, that doesn't make the API less RESTful. If that's not the
>> right answer, I don't understand what you mean.
>
> application/json doesn't define the semantics of the payload e.g.  
> how to interact with the resource. To do that it would have to be  
> application/json+couchdoc et al.
>
>>> and it uses externally defined URL structures to effect operations.
>>
>> Can you elaborate on that?
>
> To be RESTful, the means of constructing URLs needs to be defined by  
> the media type specification. For example, having ?rev= is a rule  
> that is external to both the media type and the document.
>
> A RESTful API would have a single entry point, with every other URL  
> and service constructed/discovered by processing the content,  
> applying the rules of the media type to the content to construct new  
> URLS, just like HTML. The HTML web doesn't have a manual describing  
> how to effect operations by constructing certain URLs beyond the  
> interpretation of the content.

With Chris Andersons's "show" document and "list" view work, we have  
the beginnings of that.

-Damien

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

On 24 Feb 2009, at 12:54, Antony Blakey wrote:

>
> On 24/02/2009, at 10:11 PM, Robert Dionne wrote:
>
>> I read this thesis ages ago, and technically you are correct, if  
>> somewhat pedantic. I think CouchDB captures the just of being REST- 
>> ful and certainly from a marketing perspective it's timely.
>
> That's why I say it's a marketing issue. Surely we shouldn't copy  
> Microsoft's marketing tactics and deliberately misuse a term for  
> marketing reasons. The site should say 'HTTP API'.
>
>> When I mention to potential customers that CouchDB database are  
>> accessed with URIs they say "oh it uses this new REST stuff,  
>> cool".  Often we have little choice over how the world takes an  
>> idea and runs with it.
>
> But we don't have to be complicit. And remember this isn't about the  
> world taking an *idea*. It's about people wanting a cool label to  
> stick on their project, even if the label doesn't fit.
>
> What term would you suggest for a service that fulfills Fielding's  
> definition? Certainly the benefits of being 'RESTful' according to  
> his definition don't flow on to CouchDB, because it's NOT actually  
> RESTful.

Yes, you were right, let's not fire up this argument.

Sorry for the noise.

Cheers
Jan
--

Re: Fail on a simple case on replication

Posted by Robert Dionne <di...@dionne-associates.com>.

Robert Dionne
Chief Programmer
dionne@dionne-associates.com
203.231.9961

On Feb 24, 2009, at 6:54 AM, Antony Blakey wrote:

>
> On 24/02/2009, at 10:11 PM, Robert Dionne wrote:
>
>> I read this thesis ages ago, and technically you are correct, if  
>> somewhat pedantic. I think CouchDB captures the just of being REST- 
>> ful and certainly from a marketing perspective it's timely.
>
> That's why I say it's a marketing issue. Surely we shouldn't copy  
> Microsoft's marketing tactics and deliberately misuse a term for  
> marketing reasons. The site should say 'HTTP API'.

It's an exaggeration to suggest that CouchDB's use of the term REST  
is akin to Microsoft's marketing tactics. Nor is it a matter of being  
complicit. Your argument that it is not RESTful is similar to saying  
someone is not a good catholic because they eat meat on Fridays and  
subscribe to other reforms of recent vatican councils. REST is an  
interesting idea but let's face it, with all due respect to Roy  
Fielding, it's merely a statement that this is how the web works and  
what makes it work well. It generated excitement I think largely as a  
contrast to the ugliness of SOAP. I'm happy it produced a readable  
thesis.

In fact the fuzziness of the idea explains why there are so many  
arguments about what's RESTful or not.

>
>> When I mention to potential customers that CouchDB database are  
>> accessed with URIs they say "oh it uses this new REST stuff,  
>> cool".  Often we have little choice over how the world takes an  
>> idea and runs with it.
>
> But we don't have to be complicit. And remember this isn't about  
> the world taking an *idea*. It's about people wanting a cool label  
> to stick on their project, even if the label doesn't fit.
>
> What term would you suggest for a service that fulfills Fielding's  
> definition? Certainly the benefits of being 'RESTful' according to  
> his definition don't flow on to CouchDB, because it's NOT actually  
> RESTful.
>
> Antony Blakey
> -------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> A Man may make a Remark –
> In itself – a quiet thing
> That may furnish the Fuse unto a Spark
> In dormant nature – lain –
>
> Let us divide – with skill –
> Let us discourse – with care –
> Powder exists in Charcoal –
> Before it exists in Fire –
>
>   -– Emily Dickinson 913 (1865)
>
>

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 10:11 PM, Robert Dionne wrote:

> I read this thesis ages ago, and technically you are correct, if  
> somewhat pedantic. I think CouchDB captures the just of being REST- 
> ful and certainly from a marketing perspective it's timely.

That's why I say it's a marketing issue. Surely we shouldn't copy  
Microsoft's marketing tactics and deliberately misuse a term for  
marketing reasons. The site should say 'HTTP API'.

> When I mention to potential customers that CouchDB database are  
> accessed with URIs they say "oh it uses this new REST stuff, cool".   
> Often we have little choice over how the world takes an idea and  
> runs with it.

But we don't have to be complicit. And remember this isn't about the  
world taking an *idea*. It's about people wanting a cool label to  
stick on their project, even if the label doesn't fit.

What term would you suggest for a service that fulfills Fielding's  
definition? Certainly the benefits of being 'RESTful' according to his  
definition don't flow on to CouchDB, because it's NOT actually RESTful.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A Man may make a Remark –
In itself – a quiet thing
That may furnish the Fuse unto a Spark
In dormant nature – lain –

Let us divide – with skill –
Let us discourse – with care –
Powder exists in Charcoal –
Before it exists in Fire –

   -– Emily Dickinson 913 (1865)

Re: Fail on a simple case on replication

Posted by Robert Dionne <di...@dionne-associates.com>.

Robert Dionne
Chief Programmer
dionne@dionne-associates.com
203.231.9961



On Feb 24, 2009, at 6:26 AM, Antony Blakey wrote:

>
> On 24/02/2009, at 9:29 PM, Jan Lehnardt wrote:
>
>> CouchDB documents are limited to JSON (application/json) as the
>> content, that doesn't make the API less RESTful. If that's not the
>> right answer, I don't understand what you mean.
>
> application/json doesn't define the semantics of the payload e.g.  
> how to interact with the resource. To do that it would have to be  
> application/json+couchdoc et al.
>
>>> and it uses externally defined URL structures to effect operations.
>>
>> Can you elaborate on that?
>
> To be RESTful, the means of constructing URLs needs to be defined  
> by the media type specification. For example, having ?rev= is a  
> rule that is external to both the media type and the document.
>
> A RESTful API would have a single entry point, with every other URL  
> and service constructed/discovered by processing the content,  
> applying the rules of the media type to the content to construct  
> new URLS, just like HTML. The HTML web doesn't have a manual  
> describing how to effect operations by constructing certain URLs  
> beyond the interpretation of the content.
>
>> Why then have folks like Sam Ruby* or Tim Bray not objected yet?
>> Not trying to pick a fight here, I'm just wondering if you are  
>> interpreting
>> "the spec" a little too strict?
>
> The term is defined by Roy Fielding's thesis, and he has objected  
> to the misuse of the term: http://roy.gbiv.com/untangled/2008/rest- 
> apis-must-be-hypertext-driven. And the next post: http:// 
> roy.gbiv.com/untangled/2008/specialization is also good.

Antony,

   I read this thesis ages ago, and technically you are correct, if  
somewhat pedantic. I think CouchDB captures the just of being REST- 
ful and certainly from a marketing perspective it's timely. When I  
mention to potential customers that CouchDB database are accessed  
with URIs they say "oh it uses this new REST stuff, cool".  Often we  
have little choice over how the world takes an idea and runs with it.

Regards,

Bob



>
>>> My argument in this context is pointless. I know it's not going  
>>> to change.
>>
>> How about not trying to subtly create "them-and-us" situation? It  
>> seems
>> strange given that you clarified a statement about "the PMC"  
>> earlier in
>> this thread to avoid misinterpretation (thanks). Also, you never  
>> brought
>> this up, so how do you know it is not going to change?
>
> I have brought this up before on couchdb-user@incubator.apache.org  
> - 15 November 2008, Subject: RESTful? (was: Re: Document Updates).  
> Apache archives don't cover that time on that list.
>
> Hence, my comment, - let's not fire up this argument. I meant that  
> I wasn't going to waste m/l bandwidth rehashing an argument that  
> has already been done and dusted in this context.
>
> Antony Blakey
> --------------------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> The project was so plagued by politics and ego that when the  
> engineers requested technical oversight, our manager hired a  
> psychologist instead.
>  -- Ron Avitzur
>

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 9:29 PM, Jan Lehnardt wrote:

> CouchDB documents are limited to JSON (application/json) as the
> content, that doesn't make the API less RESTful. If that's not the
> right answer, I don't understand what you mean.

application/json doesn't define the semantics of the payload e.g. how  
to interact with the resource. To do that it would have to be  
application/json+couchdoc et al.

>> and it uses externally defined URL structures to effect operations.
>
> Can you elaborate on that?

To be RESTful, the means of constructing URLs needs to be defined by  
the media type specification. For example, having ?rev= is a rule that  
is external to both the media type and the document.

A RESTful API would have a single entry point, with every other URL  
and service constructed/discovered by processing the content, applying  
the rules of the media type to the content to construct new URLS, just  
like HTML. The HTML web doesn't have a manual describing how to effect  
operations by constructing certain URLs beyond the interpretation of  
the content.

> Why then have folks like Sam Ruby* or Tim Bray not objected yet?
> Not trying to pick a fight here, I'm just wondering if you are  
> interpreting
> "the spec" a little too strict?

The term is defined by Roy Fielding's thesis, and he has objected to  
the misuse of the term: http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven 
. And the next post: http://roy.gbiv.com/untangled/2008/specialization  
is also good.

>> My argument in this context is pointless. I know it's not going to  
>> change.
>
> How about not trying to subtly create "them-and-us" situation? It  
> seems
> strange given that you clarified a statement about "the PMC" earlier  
> in
> this thread to avoid misinterpretation (thanks). Also, you never  
> brought
> this up, so how do you know it is not going to change?

I have brought this up before on couchdb-user@incubator.apache.org -  
15 November 2008, Subject: RESTful? (was: Re: Document Updates).  
Apache archives don't cover that time on that list.

Hence, my comment, - let's not fire up this argument. I meant that I  
wasn't going to waste m/l bandwidth rehashing an argument that has  
already been done and dusted in this context.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The project was so plagued by politics and ego that when the engineers  
requested technical oversight, our manager hired a psychologist instead.
  -- Ron Avitzur

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

On 24 Feb 2009, at 11:44, Antony Blakey wrote:

>
> On 24/02/2009, at 9:02 PM, Jan Lehnardt wrote:
>
>> Hi Antony,
>>
>> On 24 Feb 2009, at 00:34, Antony Blakey wrote:
>>> <flamesuit on>
>>> OTOH, one should use the correct term and not redefine existing  
>>> terms to suit one's own purpose. In a tangentially related way,  
>>> the use of the term RESTful wrt CouchDB is a marketing abomination.
>>> </flamesuit off>
>>
>> I've heard that before. CouchDB's core document API is as
>> RESTful as it gets. But not all of CouchDB's API is RESTful
>> and it wouldn't even make sense. I don't see any abomination
>> going on here. Thanks.
>
> Couch's core document API is not RESTful. It doesn't use a specific  
> media type to define the interpretation of the content,

CouchDB documents are limited to JSON (application/json) as the
content, that doesn't make the API less RESTful. If that's not the
right answer, I don't understand what you mean.

> and it uses externally defined URL structures to effect operations.

Can you elaborate on that?

> That's not RESTful, and I don't think CouchDB should use the term.

Why then have folks like Sam Ruby* or Tim Bray not objected yet?
Not trying to pick a fight here, I'm just wondering if you are  
interpreting
"the spec" a little too strict?

*Sam having co-written "RESTful Web Services" for O'Reilly and being
chiefly responsible for CouchDB's incubation at the ASF.

> My argument in this context is pointless. I know it's not going to  
> change.

How about not trying to subtly create "them-and-us" situation? It seems
strange given that you clarified a statement about "the PMC" earlier in
this thread to avoid misinterpretation (thanks). Also, you never brought
this up, so how do you know it is not going to change?

Cheers
Jan
--

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 9:02 PM, Jan Lehnardt wrote:

> Hi Antony,
>
> On 24 Feb 2009, at 00:34, Antony Blakey wrote:
>> <flamesuit on>
>> OTOH, one should use the correct term and not redefine existing  
>> terms to suit one's own purpose. In a tangentially related way, the  
>> use of the term RESTful wrt CouchDB is a marketing abomination.
>> </flamesuit off>
>
> I've heard that before. CouchDB's core document API is as
> RESTful as it gets. But not all of CouchDB's API is RESTful
> and it wouldn't even make sense. I don't see any abomination
> going on here. Thanks.

Couch's core document API is not RESTful. It doesn't use a specific  
media type to define the interpretation of the content, and it uses  
externally defined URL structures to effect operations. That's not  
RESTful, and I don't think CouchDB should use the term.

My argument in this context is pointless. I know it's not going to  
change.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The project was so plagued by politics and ego that when the engineers  
requested technical oversight, our manager hired a psychologist instead.
   -- Ron Avitzur

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

Hi Antony,

On 24 Feb 2009, at 00:34, Antony Blakey wrote:
> <flamesuit on>
> OTOH, one should use the correct term and not redefine existing  
> terms to suit one's own purpose. In a tangentially related way, the  
> use of the term RESTful wrt CouchDB is a marketing abomination.
> </flamesuit off>

I've heard that before. CouchDB's core document API is as
RESTful as it gets. But not all of CouchDB's API is RESTful
and it wouldn't even make sense. I don't see any abomination
going on here. Thanks.

Your point behind the flame, not redefining existing terms:
The existing notion of a revision is that it is something you
can go back to. This is not what CouchDB revisions are, so
we are, right now, repurposing existing terminology.

I'm not saying, revision is wrong because it isn't. It's just not
a good choice for the API from a learning perspective. I under-
stand, that an API has more perspectives than learning it, so
we need to find out where to make the trade-off.

<toungeincheek>
We're violating rules 1, 5, 6, 13, 14, and 15, probably more of
Rusty's rules of hard to use interfaces,
http://www.pointy-stick.com/blog/2008/01/09/api-design-rusty-levels/
</toungeincheek>

> The documentation about replication, the role of revisions, the lack  
> of inter-document consistency guarantees (including, crucially to  
> the operation model, the lack of Monotonic Write guarantees), really  
> needs to be expanded.

> The consequences of CouchDB's underlying model aren't immediately  
> obvious, and should be spelled out, as I started to do here: http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c0FDDC57C-DB78-4241-86DE-549FECC8B558@gmail.com%3e 
>  - which was obviously in the context of changing that mechanism,  
> but still the explanation and references are useful.

The wiki is open for all and everybody here welcomes useful additions.

Cheers
Jan
--

Re: Fail on a simple case on replication

Posted by Patrick Antivackis <pa...@gmail.com>.

2009/2/24 Jan Lehnardt <ja...@apache.org>

>
> On 24 Feb 2009, at 13:52, Patrick Antivackis wrote:
>
>> It's like all politically correct terminology where you use a stupid
>>>> expression in order to be as neutral as possible.
>>>>
>>>>
>>> You have a point here, it is about avoiding conflict. But I don't think
>>> we're looking for a neutral term here, but one with a better name.
>>> I'd go with _access_token if it weren't too long. _rev is nice and short
>>> and _token might as well be _wibble. API design is hard.
>>>
>>>
>> May be it's about conflict, but as it's also a previous release, it's by
>> definition a revision. The fact that the revision is no more there is not
>> changing the fact that it's a revision.
>>
>
> Haha, language ambiguity for the win :) I meant conflict between
> users applying prior understanding of the term "revision" to CouchDB
> revisions causing a conflict. I did not mean using _rev as a token to
> manage write conflicts for a document. I need to be more careful with
> these words :)
>

Don't worry i'm neither english speaking native too.


>
>
>
>  That's why if the name is changed, the functionality to access a previous
>> revision should be removed.
>>
>
> I could see that being a valid conclusion and I think that would be
> covered with disabling the feature by default and make it an opt-in
> like Damien suggested. We also could just nuke it completely and
> wait for complaints before reconsidering making it an opt-in.
>
>
Great so my vote becomes : -0

>
>
> Cheers
> Jan
> --
>
>
>  --
>>>
>>>
>>>
>>>
>>> IMO if you change this
>>>
>>>> attribute name it's even better to remove all possibilities to a access
>>>> a
>>>> previous rev if still there, and change it's value by a timestamp
>>>>
>>>>
>>>> Regards
>>>>
>>>> 2009/2/24 Antony Blakey <an...@gmail.com>
>>>>
>>>>
>>>>  On 24/02/2009, at 12:51 PM, Antony Blakey wrote:
>>>>>
>>>>> The project founder and the PMC, are all committed to that replication
>>>>>
>>>>>  model, which is derived from Notes.
>>>>>>
>>>>>>
>>>>>>  BTW I'm the only one in the community that has expressed any strong
>>>>> desire
>>>>> to change this - I'm not implying any community division, just pointing
>>>>> out
>>>>> that it's both an historical artifact, and accepted by the major
>>>>> contributors and committers.
>>>>>
>>>>> Antony Blakey
>>>>> --------------------------
>>>>> CTO, Linkuistics Pty Ltd
>>>>> Ph: 0438 840 787
>>>>>
>>>>> Plurality is not to be assumed without necessity
>>>>> -- William of Ockham (ca. 1285-1349)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

On 24 Feb 2009, at 13:52, Patrick Antivackis wrote:
>>> It's like all politically correct terminology where you use a stupid
>>> expression in order to be as neutral as possible.
>>>
>>
>> You have a point here, it is about avoiding conflict. But I don't  
>> think
>> we're looking for a neutral term here, but one with a better name.
>> I'd go with _access_token if it weren't too long. _rev is nice and  
>> short
>> and _token might as well be _wibble. API design is hard.
>>
>
> May be it's about conflict, but as it's also a previous release,  
> it's by
> definition a revision. The fact that the revision is no more there  
> is not
> changing the fact that it's a revision.

Haha, language ambiguity for the win :) I meant conflict between
users applying prior understanding of the term "revision" to CouchDB
revisions causing a conflict. I did not mean using _rev as a token to
manage write conflicts for a document. I need to be more careful with
these words :)


> That's why if the name is changed, the functionality to access a  
> previous
> revision should be removed.

I could see that being a valid conclusion and I think that would be
covered with disabling the feature by default and make it an opt-in
like Damien suggested. We also could just nuke it completely and
wait for complaints before reconsidering making it an opt-in.


Cheers
Jan
--


>> --
>>
>>
>>
>>
>> IMO if you change this
>>> attribute name it's even better to remove all possibilities to a  
>>> access a
>>> previous rev if still there, and change it's value by a timestamp
>>>
>>>
>>> Regards
>>>
>>> 2009/2/24 Antony Blakey <an...@gmail.com>
>>>
>>>
>>>> On 24/02/2009, at 12:51 PM, Antony Blakey wrote:
>>>>
>>>> The project founder and the PMC, are all committed to that  
>>>> replication
>>>>
>>>>> model, which is derived from Notes.
>>>>>
>>>>>
>>>> BTW I'm the only one in the community that has expressed any strong
>>>> desire
>>>> to change this - I'm not implying any community division, just  
>>>> pointing
>>>> out
>>>> that it's both an historical artifact, and accepted by the major
>>>> contributors and committers.
>>>>
>>>> Antony Blakey
>>>> --------------------------
>>>> CTO, Linkuistics Pty Ltd
>>>> Ph: 0438 840 787
>>>>
>>>> Plurality is not to be assumed without necessity
>>>> -- William of Ockham (ca. 1285-1349)
>>>>
>>>>
>>>>
>>>>
>>

Re: Fail on a simple case on replication

Posted by Patrick Antivackis <pa...@gmail.com>.

Hi Jan,


>
>  Oh and by the way, in a use case where there is only one database and you
>> don't use compaction because you want to keep everything, well _rev is a
>> revision that can be used to see the history of the document.
>>
>
> You still shouldn't and that's what's in the documentation :) Just because
> you can tie a skateboard to a car and drive on the highway would make
> one hell of a fun ride, you are not advised to do so. :)
>

Don't worry ;) , on my side i not  do this,   as I know when  I will make
compaction, i run a program before compaction that will take care of
"archiving" previous rev.

 I really don't
> see the point of renaming an attribute to make it harder to understand it's
> role.
>

The suggestion here is to rename to make it _easier_ to understand
> because the connotations "revision" comes with are not entirely
> valid for CouchDB.
>
>> It's like all politically correct terminology where you use a stupid
>> expression in order to be as neutral as possible.
>>
>
> You have a point here, it is about avoiding conflict. But I don't think
> we're looking for a neutral term here, but one with a better name.
> I'd go with _access_token if it weren't too long. _rev is nice and short
> and _token might as well be _wibble. API design is hard.
>

May be it's about conflict, but as it's also a previous release, it's by
definition a revision. The fact that the revision is no more there is not
changing the fact that it's a revision.

That's why if the name is changed, the functionality to access a previous
revision should be removed.



>
>
> Cheers
> Jan
> --
>
>
>
>
>  IMO if you change this
>> attribute name it's even better to remove all possibilities to a access a
>> previous rev if still there, and change it's value by a timestamp
>>
>>
>> Regards
>>
>> 2009/2/24 Antony Blakey <an...@gmail.com>
>>
>>
>>> On 24/02/2009, at 12:51 PM, Antony Blakey wrote:
>>>
>>> The project founder and the PMC, are all committed to that replication
>>>
>>>> model, which is derived from Notes.
>>>>
>>>>
>>> BTW I'm the only one in the community that has expressed any strong
>>> desire
>>> to change this - I'm not implying any community division, just pointing
>>> out
>>> that it's both an historical artifact, and accepted by the major
>>> contributors and committers.
>>>
>>> Antony Blakey
>>> --------------------------
>>> CTO, Linkuistics Pty Ltd
>>> Ph: 0438 840 787
>>>
>>> Plurality is not to be assumed without necessity
>>> -- William of Ockham (ca. 1285-1349)
>>>
>>>
>>>
>>>
>

Re: Fail on a simple case on replication

Posted by Brian Candler <B....@pobox.com>.

On Wed, Feb 25, 2009 at 08:07:39PM +1030, Antony Blakey wrote:
> There's _id when it's not supplied in a PUT, but that would be supplied 
> by the Location header in the result. The more I think about it, the more 
> I like this idea.
>
> A lot more work on the client side though to deal with e.g. view  
> results, and I wonder about the subsequent loss of convenience with e.g. 
> curl in that context, although I must admit I'm no curl guru with  
> multipart mime.

Yes, it needs thinking through. But suppose the lower layers of document
storage were separated off, so you just had:
- append document to database
- Btree indexing
- map and reduce
- replication and conflict resolution
- compaction

It would store a small amount of internal metadata (e.g. content-type, _rev,
perhaps a _sha1) plus the raw document as received, as a blob.

I can see immediate uses for this document store. For example, when
warehousing RADIUS accounting packets, I could just store them in their raw
binary form. This is not only smaller than JSON, and involves less
processing, but it is a more accurate representation of what was actually
received on the wire.

If I were to use this approach in today's CouchDB (as stub JSON object plus
an attachment), I would lose the ability to do map/reduce on the packet. (*)

Of course, such a map function would have to be quite smart, as it would be
parsing binary RADIUS packets to pull out the fields of interest, but there
are libraries which do that; and even if not for Javascript, there is
already the capability to do map/reduce processing in any other language.

Then there's the issue of what map and reduce functions should output.

I think it would be consistent if map functions could generate arbitrary
binary data, tagged with its own content-type (e.g. so you can have a map
function which converts an image/png to an image/jpeg)

This complicates reduce functions, which would have to receive both docs and
their corresponding mime-types (bundled together somehow, perhaps as a JSON
array)

Then I suppose reduce outputs (and re-reduce inputs) could also be arbitrary
MIME objects. It might be convenient to use JSON for these, but there's no
particular reason to enforce this. You might want to output plain-text
strings from your reduce function, or Ruby Marshall objects, or whatever was
convenient.

With great power comes great danger of shooting yourself in the foot, and so
a layer on top of this which *enforces* JSON would be a good-to-have too,
and could even be the default.

Regards,

Brian.

(*) This does suggest another approach which could give the same benefit:
allow map functions to have access to the document's attachments somehow.
But if this were to be bundled with the doc directly it would have to be
base64 encoded, since JSON doesn't permit binary strings. And it would need
to be made available on-demand somehow.

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 25/02/2009, at 8:49 AM, Brian Candler wrote:

> There's Content-Type (standard HTTP header in both directions), and  
> there's
> _rev (or previous _rev). The latter can be in the URL for a PUT, and  
> perhaps
> a header for a GET. If revisions were a document hash, there's the  
> standard
> Content-MD5 header, or the (less standard) Content-SHA1 header.
>
> What else am I missing?

There's _id when it's not supplied in a PUT, but that would be  
supplied by the Location header in the result. The more I think about  
it, the more I like this idea.

A lot more work on the client side though to deal with e.g. view  
results, and I wonder about the subsequent loss of convenience with  
e.g. curl in that context, although I must admit I'm no curl guru with  
multipart mime.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The project was so plagued by politics and ego that when the engineers  
requested technical oversight, our manager hired a psychologist instead.
   -- Ron Avitzur

Re: Fail on a simple case on replication

Posted by Brian Candler <B....@pobox.com>.

On Tue, Feb 24, 2009 at 11:17:20PM +1030, Antony Blakey wrote:
>
> On 24/02/2009, at 11:09 PM, Brian Candler wrote:
>
>> On a random tangent: has anyone considered a CouchDB-like system where
>> documents are raw blobs, rather than JSON? ISTM that:
>
> You'd need some way to attach/inject the metadata in both directions.

There's Content-Type (standard HTTP header in both directions), and there's
_rev (or previous _rev). The latter can be in the URL for a PUT, and perhaps
a header for a GET. If revisions were a document hash, there's the standard
Content-MD5 header, or the (less standard) Content-SHA1 header.

What else am I missing?

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 11:09 PM, Brian Candler wrote:

> On a random tangent: has anyone considered a CouchDB-like system where
> documents are raw blobs, rather than JSON? ISTM that:

You'd need some way to attach/inject the metadata in both directions.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A reasonable man adapts himself to suit his environment. An  
unreasonable man persists in attempting to adapt his environment to  
suit himself. Therefore, all progress depends on the unreasonable man.
   -- George Bernard Shaw

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 25/02/2009, at 8:52 AM, Brian Candler wrote:

> On Tue, Feb 24, 2009 at 01:48:56PM +0100, Jan Lehnardt wrote:
>>> However, you must then be prepared for your database to be a single
>>> file
>>> which grows without bounds. If CouchDB wants to support this  
>>> model, it
>>> would
>>> be helpful if the data were stored in chunks which can be backed up
>>> separately.
>>
>> rsync? :)
>
> Doesn't work especially well on huge files.

What about this incremental backup strategy:

1. Split off the MVCC header
2. Compare the previous and current file lengths and split of the new  
tail
3. Backup the header and the tail

> I'm not sure about 'kitchen sink', but I've seen desires expressed  
> for more
> pluggability; perhaps JSON could be a pluggable layer sitting on top  
> of a
> raw document store?

I've been thinking about the layering in CouchDB along these lines:

         MVCC Store

Replication | Map/Reduce

in order to allow different replication strategies. I think of CouchDB  
as a collection of features that can themselves be plugged together to  
build systems with different semantics.

Rrather than making Couch more pluggable, we could make it more of a  
construction kit. There's already a flavour of this when you look at  
the .ini file that builds up a Couch server from different endpoints  
and daemons.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The truth does not change according to our ability to stomach it.
   -- Flannery O'Connor

Re: Fail on a simple case on replication

Posted by Brian Candler <B....@pobox.com>.

On Tue, Feb 24, 2009 at 01:48:56PM +0100, Jan Lehnardt wrote:
>> However, you must then be prepared for your database to be a single  
>> file
>> which grows without bounds. If CouchDB wants to support this model, it 
>> would
>> be helpful if the data were stored in chunks which can be backed up
>> separately.
>
> rsync? :)

Doesn't work especially well on huge files. Indeed, files >2GB aren't
handled well by many systems. If I had to migrate 200GB of data, I'd much
prefer 100 x 2GB than 1 x 200GB. It also has the advantage that 99 of those
files are not changing.

>> Just thinking out loud.
>
> This is quite interesting! :) I'd like to see such a system, but I'd  
> also like
> CouchDB not becoming an Apache-httpd style kitchen-sink for all things
> HTTP. Maybe Yaws is what you're looking for?

Yaws is just another Mochiweb, isn't it?

I'm not sure about 'kitchen sink', but I've seen desires expressed for more
pluggability; perhaps JSON could be a pluggable layer sitting on top of a
raw document store?

Regards,

Brian.

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

On 24 Feb 2009, at 13:39, Brian Candler wrote:

> On Tue, Feb 24, 2009 at 09:06:09AM +0100, Patrick Antivackis wrote:
>> Oh and by the way, in a use case where there is only one database  
>> and you
>> don't use compaction because you want to keep everything, well _rev  
>> is a
>> revision that can be used to see the history of the document.
>
> This is a good point. If you follow "accountants don't use erasers"  
> then you
> will never compact (and maybe you want a flag which prevents  
> compaction).

You'd not use revisions to keep records around but proper documents.


> However, you must then be prepared for your database to be a single  
> file
> which grows without bounds. If CouchDB wants to support this model,  
> it would
> be helpful if the data were stored in chunks which can be backed up
> separately.

rsync? :)


> "Compaction" for saving space could be achieved by rewriting the  
> database,
> but keeping diffs for earlier revisions. At this point you would end  
> up with
> something roughly like git.
>
> On a random tangent: has anyone considered a CouchDB-like system where
> documents are raw blobs, rather than JSON? ISTM that:
>
> - it would save a lot of conversion between Erlang terms and JSON
> - it would remove the second-class nature of attachments
> - it would allow structured data to be stored in arbitary formats  
> (e.g. XML)
> - it would allow map/reduce to work on binary data (e.g. use a map  
> function
>  to make thumbnails of all your jpegs)
> - you could still use JSON quite happily, e.g.
>
>  function map(type, data) {
>    if (type == "application/json") {
>      doc = evalcx(data);
>      ... continue as normal
>    }
>  }
>
> I guess some of the APIs would become a bit more awkward though. For
> example, bulk document insert would probably become MIME multipart.
>
> In principle, I think you could get today's CouchDB as a thin layer  
> on top
> of this. However, "attachments" do have interesting special  
> semantics (e.g.
> deleting a document deletes all its attachments) which might need some
> parent/child relationship between documents to maintain. Having that
> relationship between documents in a more general form could also be  
> useful.
>
> Just thinking out loud.

This is quite interesting! :) I'd like to see such a system, but I'd  
also like
CouchDB not becoming an Apache-httpd style kitchen-sink for all things
HTTP. Maybe Yaws is what you're looking for?

Cheers
Jan
--

Re: Fail on a simple case on replication

Posted by Brian Candler <B....@pobox.com>.

On Tue, Feb 24, 2009 at 09:06:09AM +0100, Patrick Antivackis wrote:
> Oh and by the way, in a use case where there is only one database and you
> don't use compaction because you want to keep everything, well _rev is a
> revision that can be used to see the history of the document.

This is a good point. If you follow "accountants don't use erasers" then you
will never compact (and maybe you want a flag which prevents compaction).

However, you must then be prepared for your database to be a single file
which grows without bounds. If CouchDB wants to support this model, it would
be helpful if the data were stored in chunks which can be backed up
separately.

"Compaction" for saving space could be achieved by rewriting the database,
but keeping diffs for earlier revisions. At this point you would end up with
something roughly like git.

On a random tangent: has anyone considered a CouchDB-like system where
documents are raw blobs, rather than JSON? ISTM that:

- it would save a lot of conversion between Erlang terms and JSON
- it would remove the second-class nature of attachments
- it would allow structured data to be stored in arbitary formats (e.g. XML)
- it would allow map/reduce to work on binary data (e.g. use a map function
  to make thumbnails of all your jpegs)
- you could still use JSON quite happily, e.g.

  function map(type, data) {
    if (type == "application/json") {
      doc = evalcx(data);
      ... continue as normal
    }
  }

I guess some of the APIs would become a bit more awkward though. For
example, bulk document insert would probably become MIME multipart.

In principle, I think you could get today's CouchDB as a thin layer on top
of this. However, "attachments" do have interesting special semantics (e.g.
deleting a document deletes all its attachments) which might need some
parent/child relationship between documents to maintain. Having that
relationship between documents in a more general form could also be useful.

Just thinking out loud.

Regards,

Brian.

Re: Fail on a simple case on replication

Posted by Robert Dionne <di...@dionne-associates.com>.

Robert Dionne
Chief Programmer
dionne@dionne-associates.com
203.231.9961



On Feb 24, 2009, at 5:52 AM, Jan Lehnardt wrote:

> Hi Patrick,
>
> On 24 Feb 2009, at 09:06, Patrick Antivackis wrote:
>
>> Oh and by the way, in a use case where there is only one database  
>> and you
>> don't use compaction because you want to keep everything, well  
>> _rev is a
>> revision that can be used to see the history of the document.
>
> You still shouldn't and that's what's in the documentation :) Just  
> because
> you can tie a skateboard to a car and drive on the highway would make
> one hell of a fun ride, you are not advised to do so. :)
>
>
>> I really don't
>> see the point of renaming an attribute to make it harder to  
>> understand it's
>> role.
>
> The suggestion here is to rename to make it _easier_ to understand
> because the connotations "revision" comes with are not entirely
> valid for CouchDB.

I agree that this is important to fix. It is too easy to assume  
CouchDB supports revision history. A lot of folks made this mistake,  
myself included. It's really internal state needed for concurrency  
control, yet it's exposed to users and required to be maintained in  
the document. So it needs to be called something that reflects this  
internal use, like "_int_bit" or "_token" or "_cc_uid"


>
>
>> It's like all politically correct terminology where you use a stupid
>> expression in order to be as neutral as possible.
>
> You have a point here, it is about avoiding conflict. But I don't  
> think
> we're looking for a neutral term here, but one with a better name.
> I'd go with _access_token if it weren't too long. _rev is nice and  
> short
> and _token might as well be _wibble. API design is hard.
>
>
> Cheers
> Jan
> --
>
>
>
>> IMO if you change this
>> attribute name it's even better to remove all possibilities to a  
>> access a
>> previous rev if still there, and change it's value by a timestamp
>>
>>
>> Regards
>>
>> 2009/2/24 Antony Blakey <an...@gmail.com>
>>
>>>
>>> On 24/02/2009, at 12:51 PM, Antony Blakey wrote:
>>>
>>> The project founder and the PMC, are all committed to that  
>>> replication
>>>> model, which is derived from Notes.
>>>>
>>>
>>> BTW I'm the only one in the community that has expressed any  
>>> strong desire
>>> to change this - I'm not implying any community division, just  
>>> pointing out
>>> that it's both an historical artifact, and accepted by the major
>>> contributors and committers.
>>>
>>> Antony Blakey
>>> --------------------------
>>> CTO, Linkuistics Pty Ltd
>>> Ph: 0438 840 787
>>>
>>> Plurality is not to be assumed without necessity
>>> -- William of Ockham (ca. 1285-1349)
>>>
>>>
>>>
>

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

Hi Patrick,

On 24 Feb 2009, at 09:06, Patrick Antivackis wrote:

> Oh and by the way, in a use case where there is only one database  
> and you
> don't use compaction because you want to keep everything, well _rev  
> is a
> revision that can be used to see the history of the document.

You still shouldn't and that's what's in the documentation :) Just  
because
you can tie a skateboard to a car and drive on the highway would make
one hell of a fun ride, you are not advised to do so. :)

> I really don't
> see the point of renaming an attribute to make it harder to  
> understand it's
> role.

The suggestion here is to rename to make it _easier_ to understand
because the connotations "revision" comes with are not entirely
valid for CouchDB.

> It's like all politically correct terminology where you use a stupid
> expression in order to be as neutral as possible.

You have a point here, it is about avoiding conflict. But I don't think
we're looking for a neutral term here, but one with a better name.
I'd go with _access_token if it weren't too long. _rev is nice and short
and _token might as well be _wibble. API design is hard.

Cheers
Jan
--

> IMO if you change this
> attribute name it's even better to remove all possibilities to a  
> access a
> previous rev if still there, and change it's value by a timestamp
>
>
> Regards
>
> 2009/2/24 Antony Blakey <an...@gmail.com>
>
>>
>> On 24/02/2009, at 12:51 PM, Antony Blakey wrote:
>>
>> The project founder and the PMC, are all committed to that  
>> replication
>>> model, which is derived from Notes.
>>>
>>
>> BTW I'm the only one in the community that has expressed any strong  
>> desire
>> to change this - I'm not implying any community division, just  
>> pointing out
>> that it's both an historical artifact, and accepted by the major
>> contributors and committers.
>>
>> Antony Blakey
>> --------------------------
>> CTO, Linkuistics Pty Ltd
>> Ph: 0438 840 787
>>
>> Plurality is not to be assumed without necessity
>> -- William of Ockham (ca. 1285-1349)
>>
>>
>>

Re: Fail on a simple case on replication

Posted by Patrick Antivackis <pa...@gmail.com>.

Oh and by the way, in a use case where there is only one database and you
don't use compaction because you want to keep everything, well _rev is a
revision that can be used to see the history of the document. I really don't
see the point of renaming an attribute to make it harder to understand it's
role. It's like all politically correct terminology where you use a stupid
expression in order to be as neutral as possible. IMO if you change this
attribute name it's even better to remove all possibilities to a access a
previous rev if still there, and change it's value by a timestamp

Regards

2009/2/24 Antony Blakey <an...@gmail.com>

>
> On 24/02/2009, at 12:51 PM, Antony Blakey wrote:
>
>  The project founder and the PMC, are all committed to that replication
>> model, which is derived from Notes.
>>
>
> BTW I'm the only one in the community that has expressed any strong desire
> to change this - I'm not implying any community division, just pointing out
> that it's both an historical artifact, and accepted by the major
> contributors and committers.
>
> Antony Blakey
> --------------------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> Plurality is not to be assumed without necessity
>  -- William of Ockham (ca. 1285-1349)
>
>
>

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 12:51 PM, Antony Blakey wrote:

> The project founder and the PMC, are all committed to that  
> replication model, which is derived from Notes.

BTW I'm the only one in the community that has expressed any strong  
desire to change this - I'm not implying any community division, just  
pointing out that it's both an historical artifact, and accepted by  
the major contributors and committers.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Plurality is not to be assumed without necessity
   -- William of Ockham (ca. 1285-1349)

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 12:15 PM, Chris Anderson wrote:

>>> Would it be overly difficult to just add in the ability to keep a  
>>> full rev
>>> history based on a config setting?
>
> This would be a pretty big change. As Antony says, once you go down
> that path a little, you end up at something that is not really much
> like Couch.

I don't want to re-open a dead issue, but to clarify this - there are  
other models of replication that provide stronger weak-consistency  
guarantees - I urge you to read a few Bayou papers if you are  
interested. Using such replication would be very close to Couch. So I  
don't agree with the strength of Chris's comment.

The issue however, is that Couch's identity is, and has always been,  
largely determined by it's replication model. There's so much more to  
Couch that is independent of that, such as map/reduce views, forms,  
futon, an HTTP API, JSON etc, that it's not immediately obvious that  
it's the *replication model* that makes this product 'CouchDB'. The  
project founder and the PMC, are all committed to that replication  
model, which is derived from Notes.

You can add all of the other Couch features, and in fact reuse all of  
the Couch code, with a different replication model, but it's unlikely  
it would be accepted into the Couch code base. If you want that, you  
need to fork and call it something different (which is what I'm  
doing). It's important to note however that the Couch replication  
model has some characteristics that cannot be achieved using any  
stronger form of consistency. In fact, technically speaking, Couch  
provides coherence, but NO consistency.

Given all of that, it would be good to have a very clear 'What is  
Couch' that emphasizes the primacy of the replication model (and it's  
implications, both pro and con), because none of the other things IMO  
are as central to the identity, as consequential, or as confusing  
(except maybe reduce/re-reduce) as the operational semantics of the  
replicational model.

As an aside to this (and I'm not being bolshy), looking further ahead,  
Eventual Consistency, which seems to be promoted as an article of  
faith, is not *strictly* achievable in a partial replication  
environment. Achieving Eventual Consistency is also dependent on some  
other constraints, so depending on your deployment model, it can be  
more theoretical than practical. At the end of the day however,  
dealing with non-Monotonic Writes subsumes dealing with Eventual  
Consistency in all but asymptotic senses.

These are all points that I think should be made clearly and up front  
in the documentation, because a failure to understand Couch's  
replication model, and the implications for applications, both pro and  
com, will IMO lead to failures that will be blamed on Couch, but are  
in fact due to misunderstanding. You don't want a 'Couch is a piece of  
shit' meme to establish. IMO the bulk of Couch users will not think  
this through themselves, because they will be tool users, not tool  
builders.

> There's yet to be a really clear reference for how to do
> application-versioned documents in CouchDB. Hopefully we'll address
> the topic in the book, but we haven't gotten that far yet.
>
> The way I see it, the salient options are:
>
> A) leave it as _rev and answer the versioning question every week  
> forever
> B) rename it to _mvcc or _lock or _token or something else that
> doesn't confuse people
>
> The main drawback of B is that when we start renaming _rev, someone
> else comes along and tries to take the opportunity to change _id, or
> otherwise change the whole system. If we can stick to just renaming to
> something clearer, I'm happy to go ahead with this.

Orthogonally, I still think the id and rev should be wrapped in a  
_meta tag, but modulo that ...

It's not a _lock. Saying it's a _token has nothing to do with it's  
function - it would be like calling a car a 'construct of metal'. It's  
not _mvcc because that's the name of a technique, not a thing.

Maybe _mvcc_commit_id - although in the current implementation it  
isn't, it philosophically is and could be implemented that way. But  
really, it is a document version/revision identifier. Maybe put  
'couch' in there to emphasize the internal nature of it e.g.  
'_couch_rev_id' i.e. something which, at the limit might be  
'_couch_private_revision_id_which_you_should_treat_as_opaque'.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

The intuitive mind is a sacred gift and the rational mind is a  
faithful servant. We have created a society that honours the servant  
and has forgotten the gift.
   -- Albert Einstein

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 1:00 PM, Damien Katz wrote:

> I think if we change from _rev to something else, _cc for  
> concurrency control is good. I'm not sure this is necessary.

Concurrency control describes how it got there, but it's not the thing  
itself e.g. it's not 'the concurrency control' it's an artifact of a  
concurrency control mechanism. It would be good to have the name of  
the thing describe it accurately.

I vote for this staying as is and being handled in the documentation,  
where it belongs. The API doesn't approach a descriptive semantics.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Did you hear about the Buddhist who refused Novocain during a root  
canal?
His goal: transcend dental medication.

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 1:39 PM, Jeff Hinrichs - DM&T wrote:

> scenario, master-slave -- slaves only keep the most recent, while the
> master keeps complete. conflict resolution is handled solely by the
> master.
> scenario, first-among-equals -- multi-master where a single master is
> used as the basis for conflict resolution, other masters keep only a
> limited rev history and escalate to the first-among-equals when
> Eventual Consistency can not be reached do to missing rev history on a
> peer.

My initial thought is that Eventual Consistency amongst a set of peers  
is dependent purely on anti-entropy guarantees, no partial  
replication, and deterministic conflict resolution. The idea of  
Eventual Consistency is that the peers will end up with the same  
database contents. Given pruning or revision stemming, this is only  
true if you regard database equality as not including revision history  
e.g. it's the head revision + conflicts.

But then I developed a thought experiment where this wasn't clear - in  
particular whether you can rely on conflicting versions being included  
in the state. Consider 4 peers, with one document, each peer having a  
different revision.

P1 = [ A1 ] & P2 = [ A2 ] & P3 = [ A3 ] & P4 = [ A4 ]

Now bidirectionally replicate P1 & P2: [ A1, A5=A2/A1 ] where A4=A3/A2  
means A4 is the head revision, identical to A3 but with a conflict  
reference to A2. This is Eventual Consistency for P1 & P2.

Now bidirectionally replicate P3 & P4: [ A3, A6=A4/A3 ]. This is  
Eventual Consistency for P1 & P2.

Now replicate P1 and P3. Obviously one of { A5, A6 } will be chosen  
deterministically, but are the conflicts chained? Do you end up with  
[ A2, A3, A5=A2/A1, A7=A4/[A3,A5] ]. How is it intended to treat the  
conflicts already present in documents undergoing conflict resolution?

Even writing this out is hard.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Always have a vision. Why spend your life making other people’s dreams?
  -- Orson Welles (1915-1985)

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

On 24 Feb 2009, at 04:09, Jeff Hinrichs - DM&T wrote:

> On Mon, Feb 23, 2009 at 8:43 PM, Chris Anderson <jc...@apache.org>  
> wrote:
>> On Mon, Feb 23, 2009 at 6:30 PM, Damien Katz <da...@apache.org>  
>> wrote:
>>
>>> Maybe we should change that use from ?rev... to ?conflict=
>>
>> If we follow your _cc idea, we could change from ?rev= to ?cc=
>>
>>>
>>> I think if we change from _rev to something else, _cc for  
>>> concurrency
>>> control is good. I'm not sure this is necessary.
>>
>> yes, if we make the change _cc is the best so far. I can already
>> imagine office workers thinking it stands for "conflict catcher".
>>
>>>
>>> Maybe we should only allow the ability to getting old revisions
>>> (?disk_rev=...) with a setting in the ini, defaulting it off. That
>>> discourages it's use as general purpose mechanism, but is easy to  
>>> turn on if
>>> you really need it.
>>>
>>
>> Not a bad idea. The idea that you can't depend on it being available
>> would discourage apps from attempting to use _cc as an easy way to
>> provide undo functionality for users. Undo is a good feature, but  
>> undo
>> that sometimes randomly has been compacted away is worse than no  
>> undo.
> I would point out that compaction is not a random event.  It is
> controlled by the admin, correct?  To my knowledge, couch does not
> spontaneously compact nor even currently support the idea of automated
> compaction.

You're right, but that's one use-case. I'm used to think in the  
mindset of
a shared hosting provider where users do all sorts of crazy things and
the admins need to be able to control sensible operation. I'm not saying
this is the only other use-case but I'm seeing the general case of  
CouchDB
users not being admins and this not controlling compaction. Hence,
old revisions could go away at any time and undo should not rely on
the revision system to provide this functionality.

> <devil's advocate>
> Also, earlier in the thread, Dean L, suggested allowing unlimited rev
> history.  I think that his idea has merit in light of a talked about
> patch that would limit revs history to length N.  If the ability to
> control the size(N) of rev history is in the cards, why not allow N to
> be infinity?  Before you just dismiss the idea, I would state that I
> could see usefulness for this in special cases and remind you of the
> old saw, "Accountants don't use erasers." And in the new age of
> security and compliance, Auditors don't like erasers.

We're not dismissing the usefulness. By no means, but the revision
system is the wrong place to put this from CouchDB's design standpoint.

A little history.

I've been bugging Damien to implement this ever since I started playing
with CouchDB in October '06*. Our final discussion (that was about a
year ago) that I should be coming up with an RFC that covers all angles
of distributeed editing and replication that makes this easy to  
implement.
He said he couldn't come up with an easy way yet and there are edge-
cases lurking that are really hard. Based on his experience with  
distributed
database and my lack of it, I stopped the bugging. When I ever have the
time to think this all through I might propose an RFC covering all  
angles
and maybe a patch, but until then I keep quiet and work on providing
alternatives for users like the chapter in the Couch Book that Chris
hinted at.

Not saying it can't be done, but it is harder as it may sound, however
useful it is.

Lastly, there's so many other areas where the current CouchDB needs
improvement and where the vision and implementation details are much
clearer. Let's tackle them first.

Cheers
Jan
--
* I'm not trying to be intimidating or showing of my cool, I'm just  
pointing
out that Damien has been saying "no" (well, "not yet") to this feature
for quite some time and for good reasons.

Re: Fail on a simple case on replication

Posted by Jeff Hinrichs - DM&T <du...@gmail.com>.

On Mon, Feb 23, 2009 at 8:43 PM, Chris Anderson <jc...@apache.org> wrote:
> On Mon, Feb 23, 2009 at 6:30 PM, Damien Katz <da...@apache.org> wrote:
>
>> Maybe we should change that use from ?rev... to ?conflict=
>
> If we follow your _cc idea, we could change from ?rev= to ?cc=
>
>>
>> I think if we change from _rev to something else, _cc for concurrency
>> control is good. I'm not sure this is necessary.
>
> yes, if we make the change _cc is the best so far. I can already
> imagine office workers thinking it stands for "conflict catcher".
>
>>
>> Maybe we should only allow the ability to getting old revisions
>> (?disk_rev=...) with a setting in the ini, defaulting it off. That
>> discourages it's use as general purpose mechanism, but is easy to turn on if
>> you really need it.
>>
>
> Not a bad idea. The idea that you can't depend on it being available
> would discourage apps from attempting to use _cc as an easy way to
> provide undo functionality for users. Undo is a good feature, but undo
> that sometimes randomly has been compacted away is worse than no undo.
I would point out that compaction is not a random event.  It is
controlled by the admin, correct?  To my knowledge, couch does not
spontaneously compact nor even currently support the idea of automated
compaction.

>
>
<devil's advocate>
Also, earlier in the thread, Dean L, suggested allowing unlimited rev
history.  I think that his idea has merit in light of a talked about
patch that would limit revs history to length N.  If the ability to
control the size(N) of rev history is in the cards, why not allow N to
be infinity?  Before you just dismiss the idea, I would state that I
could see usefulness for this in special cases and remind you of the
old saw, "Accountants don't use erasers." And in the new age of
security and compliance, Auditors don't like erasers.

scenario, master-slave -- slaves only keep the most recent, while the
master keeps complete. conflict resolution is handled solely by the
master.
scenario, first-among-equals -- multi-master where a single master is
used as the basis for conflict resolution, other masters keep only a
limited rev history and escalate to the first-among-equals when
Eventual Consistency can not be reached do to missing rev history on a
peer.

This is not an argument for changes to replication, or a desire for
replication with complete rev history.  Only to allow rev history size
of infinity.
</devil's advocate>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Regards,

Jeff Hinrichs

Re: Fail on a simple case on replication

Posted by Chris Anderson <jc...@apache.org>.

On Mon, Feb 23, 2009 at 6:30 PM, Damien Katz <da...@apache.org> wrote:

> Maybe we should change that use from ?rev... to ?conflict=

If we follow your _cc idea, we could change from ?rev= to ?cc=

>
> I think if we change from _rev to something else, _cc for concurrency
> control is good. I'm not sure this is necessary.

yes, if we make the change _cc is the best so far. I can already
imagine office workers thinking it stands for "conflict catcher".

>
> Maybe we should only allow the ability to getting old revisions
> (?disk_rev=...) with a setting in the ini, defaulting it off. That
> discourages it's use as general purpose mechanism, but is easy to turn on if
> you really need it.
>

Not a bad idea. The idea that you can't depend on it being available
would discourage apps from attempting to use _cc as an easy way to
provide undo functionality for users. Undo is a good feature, but undo
that sometimes randomly has been compacted away is worse than no undo.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Fail on a simple case on replication

Posted by Damien Katz <da...@apache.org>.

On Feb 23, 2009, at 8:45 PM, Chris Anderson wrote:

>>> Would it be overly difficult to just add in the ability to keep a  
>>> full rev
>>> history based on a config setting?
>
> This would be a pretty big change. As Antony says, once you go down
> that path a little, you end up at something that is not really much
> like Couch.
>
> There's yet to be a really clear reference for how to do
> application-versioned documents in CouchDB. Hopefully we'll address
> the topic in the book, but we haven't gotten that far yet.
>
> The way I see it, the salient options are:
>
> A) leave it as _rev and answer the versioning question every week  
> forever
> B) rename it to _mvcc or _lock or _token or something else that
> doesn't confuse people
>
> The main drawback of B is that when we start renaming _rev, someone
> else comes along and tries to take the opportunity to change _id, or
> otherwise change the whole system. If we can stick to just renaming to
> something clearer, I'm happy to go ahead with this.

I forgot when I posted this, we still need the ability to get conflict  
revisions, which also uses the ?rev=... syntax. Maybe we should change  
that use from ?rev... to ?conflict=...., since those rev ids show up  
in the _conflicts doc member.

I think if we change from _rev to something else, _cc for concurrency  
control is good. I'm not sure this is necessary.

Maybe we should only allow the ability to getting old revisions (? 
disk_rev=...) with a setting in the ini, defaulting it off. That  
discourages it's use as general purpose mechanism, but is easy to turn  
on if you really need it.

-Damien

Re: Fail on a simple case on replication

Posted by Chris Anderson <jc...@apache.org>.

>> Would it be overly difficult to just add in the ability to keep a full rev
>> history based on a config setting?

This would be a pretty big change. As Antony says, once you go down
that path a little, you end up at something that is not really much
like Couch.

There's yet to be a really clear reference for how to do
application-versioned documents in CouchDB. Hopefully we'll address
the topic in the book, but we haven't gotten that far yet.

The way I see it, the salient options are:

A) leave it as _rev and answer the versioning question every week forever
B) rename it to _mvcc or _lock or _token or something else that
doesn't confuse people

The main drawback of B is that when we start renaming _rev, someone
else comes along and tries to take the opportunity to change _id, or
otherwise change the whole system. If we can stick to just renaming to
something clearer, I'm happy to go ahead with this.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Fail on a simple case on replication

Posted by Antony Blakey <an...@gmail.com>.

On 24/02/2009, at 9:32 AM, Dean Landolt wrote:

>> Can you suggest how we improve the wiki docs to satisfy this? In my
>> opinion, the docs are clear* and the term is overloaded and  
>> confusing.
>>
>> * http://wiki.apache.org/couchdb/Document_revisions has
>> "You cannot rely on document revisions for any other purpose
>> than concurrency control." in bold letters.
>>
>> I stated this in earlier discussions as well: Even if our  
>> documentation
>> were perfect, we don't control how people learn about CouchDB. We
>> only control the API and we should work hard to get it right.
>>
>> The way it stands now, a lot of people new to CouchDB get it wrong
>> because "revision" is a familiar term and they associate the  
>> behaviour
>> they associate with it to them. That's how humans learn. In this case
>> we make the learning hard.

Firstly, I completely agree that one should consider the implications  
of using certain terms; the baggage and context such terms bring with  
them.

<flamesuit on>
OTOH, one should use the correct term and not redefine existing terms  
to suit one's own purpose. In a tangentially related way, the use of  
the term RESTful wrt CouchDB is a marketing abomination.
</flamesuit off>

The documentation about replication, the role of revisions, the lack  
of inter-document consistency guarantees (including, crucially to the  
operation model, the lack of Monotonic Write guarantees), really needs  
to be expanded.

The consequences of CouchDB's underlying model aren't immediately  
obvious, and should be spelled out, as I started to do here: http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c0FDDC57C-DB78-4241-86DE-549FECC8B558@gmail.com%3e 
  - which was obviously in the context of changing that mechanism, but  
still the explanation and references are useful.

> I couldn't agree more with this sentiment, but revision still  
> strikes me as
> the right term. Perhaps the easiest way to fix this misconception is  
> for
> there to actually be a way to keep old revisions around for good :)
>
> Would it be overly difficult to just add in the ability to keep a  
> full rev
> history based on a config setting? The replication api would need to
> accommodate this, of course, and if the machine you're replicating  
> from
> doesn't also keep old revisions around your SOL, but is there any  
> other
> compelling reason to not offer this option? If it wouldn't  
> complicate the
> code base, this seems like a helpful feature. Sure, it could be  
> wasteful and
> should be off by default, but if your dataset is relatively small,  
> this
> config flag would be pretty nice to have, and it could help clear up  
> this
> confusion.

Danger Will Robinson!

The problem here is that you then need to make certain guarantees  
about revisions to make them at all useful, and you get into a  
discussion like the above email thread.

IMO, discussing these issues without having read the relevant  
literature around replication models, is a waste of time. Serious  
research has been done into this, and (once again, IMO) it is more  
productive to advance that understanding than try (and possibly fail)  
to reinvent the wheel.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A priest, a minister and a rabbi walk into a bar. The bartender says  
"What is this, a joke?"

Re: Fail on a simple case on replication

Posted by Paul Davis <pa...@gmail.com>.

On Mon, Feb 23, 2009 at 6:02 PM, Dean Landolt <de...@deanlandolt.com> wrote:
> On Mon, Feb 23, 2009 at 10:30 AM, Jan Lehnardt <ja...@apache.org> wrote:
>
>>
>> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>>
>>  For a reminder :
>>>
>>> revision  (n)
>>> 1. the act or process of revising,
>>> 2. a corrected or new version of a book, article, etc.
>>>
>>> For me this term is correct with the use in Couch
>>>
>>
>> Damien is not saying the usage is wrong in CouchDB, but people
>> associate more with "revision" than he'd like. Hence the proposal.
>>
>>
>>  I think a good explanation of what a compaction/replication are doing (ie
>>> removing  old rev, or replicating only current rev) is the right solution
>>> to
>>> this misunderstanding
>>>
>>
>> Can you suggest how we improve the wiki docs to satisfy this? In my
>> opinion, the docs are clear* and the term is overloaded and confusing.
>>
>> * http://wiki.apache.org/couchdb/Document_revisions has
>> "You cannot rely on document revisions for any other purpose
>> than concurrency control." in bold letters.
>>
>> I stated this in earlier discussions as well: Even if our documentation
>> were perfect, we don't control how people learn about CouchDB. We
>> only control the API and we should work hard to get it right.
>>
>> The way it stands now, a lot of people new to CouchDB get it wrong
>> because "revision" is a familiar term and they associate the behaviour
>> they associate with it to them. That's how humans learn. In this case
>> we make the learning hard.
>
>
> I couldn't agree more with this sentiment, but revision still strikes me as
> the right term. Perhaps the easiest way to fix this misconception is for
> there to actually be a way to keep old revisions around for good :)
>
> Would it be overly difficult to just add in the ability to keep a full rev
> history based on a config setting? The replication api would need to
> accommodate this, of course, and if the machine you're replicating from
> doesn't also keep old revisions around your SOL, but is there any other
> compelling reason to not offer this option? If it wouldn't complicate the
> code base, this seems like a helpful feature. Sure, it could be wasteful and
> should be off by default, but if your dataset is relatively small, this
> config flag would be pretty nice to have, and it could help clear up this
> confusion.
>

I don't (yet) have a very through knowledge of everything that happens
inside the db files, but from the little I do know changing the
operation seems like it'd be a tall order. Then again, I could be
wrong.

Also, my suggestion for renaming would be _lock.

HTH,
Paul Davis

Re: Fail on a simple case on replication

Posted by Dean Landolt <de...@deanlandolt.com>.

On Mon, Feb 23, 2009 at 10:30 AM, Jan Lehnardt <ja...@apache.org> wrote:

>
> On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:
>
>  For a reminder :
>>
>> revision  (n)
>> 1. the act or process of revising,
>> 2. a corrected or new version of a book, article, etc.
>>
>> For me this term is correct with the use in Couch
>>
>
> Damien is not saying the usage is wrong in CouchDB, but people
> associate more with "revision" than he'd like. Hence the proposal.
>
>
>  I think a good explanation of what a compaction/replication are doing (ie
>> removing  old rev, or replicating only current rev) is the right solution
>> to
>> this misunderstanding
>>
>
> Can you suggest how we improve the wiki docs to satisfy this? In my
> opinion, the docs are clear* and the term is overloaded and confusing.
>
> * http://wiki.apache.org/couchdb/Document_revisions has
> "You cannot rely on document revisions for any other purpose
> than concurrency control." in bold letters.
>
> I stated this in earlier discussions as well: Even if our documentation
> were perfect, we don't control how people learn about CouchDB. We
> only control the API and we should work hard to get it right.
>
> The way it stands now, a lot of people new to CouchDB get it wrong
> because "revision" is a familiar term and they associate the behaviour
> they associate with it to them. That's how humans learn. In this case
> we make the learning hard.

I couldn't agree more with this sentiment, but revision still strikes me as
the right term. Perhaps the easiest way to fix this misconception is for
there to actually be a way to keep old revisions around for good :)

Would it be overly difficult to just add in the ability to keep a full rev
history based on a config setting? The replication api would need to
accommodate this, of course, and if the machine you're replicating from
doesn't also keep old revisions around your SOL, but is there any other
compelling reason to not offer this option? If it wouldn't complicate the
code base, this seems like a helpful feature. Sure, it could be wasteful and
should be off by default, but if your dataset is relatively small, this
config flag would be pretty nice to have, and it could help clear up this
confusion.

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

On 23 Feb 2009, at 16:11, Patrick Antivackis wrote:

> For a reminder :
>
> revision  (n)
> 1. the act or process of revising,
> 2. a corrected or new version of a book, article, etc.
>
> For me this term is correct with the use in Couch

Damien is not saying the usage is wrong in CouchDB, but people
associate more with "revision" than he'd like. Hence the proposal.


> I think a good explanation of what a compaction/replication are  
> doing (ie
> removing  old rev, or replicating only current rev) is the right  
> solution to
> this misunderstanding

Can you suggest how we improve the wiki docs to satisfy this? In my
opinion, the docs are clear* and the term is overloaded and confusing.

* http://wiki.apache.org/couchdb/Document_revisions has
"You cannot rely on document revisions for any other purpose
than concurrency control." in bold letters.

I stated this in earlier discussions as well: Even if our documentation
were perfect, we don't control how people learn about CouchDB. We
only control the API and we should work hard to get it right.

The way it stands now, a lot of people new to CouchDB get it wrong
because "revision" is a familiar term and they associate the behaviour
they associate with it to them. That's how humans learn. In this case
we make the learning hard.

Cheers
Jan
--


> - Remove the ability to get old revisions
>>>
>>
> -1 : This functionnality is interesting for some case studies
>
> - Make it much harder/verbose to get old revision
>
>
> -1 : I don't see the utility of this
>
> - Make the api to get old revisions something like
>>> "?old_rev_that_might_still_be_on_disk=...."
>>
>>
> 0 :
>
>
>> - Don't call them revisions, call them "turd blossoms" or "hobo  
>> socks".
>>> People won't know what they are, but at least they won't misuse  
>>> them.
>>>
>>
> -1 : revision seems the right term to me
>
>
>>
>>
>>
>>
>>> -Damien
>>>
>>> Begin forwarded message:
>>>
>>> From: Damien Katz <da...@apache.org>
>>>> Date: February 23, 2009 9:09:09 AM EST
>>>> To: user@couchdb.apache.org
>>>> Subject: Re: Fail on a simple case on replication
>>>> Reply-To: user@couchdb.apache.org
>>>>
>>>> Revisions are made available as a convenience, but CouchDB doesn't
>>>> replicate old revisions, only the most recent. Also compaction  
>>>> will remove
>>>> old revisions as well.
>>>>
>>>> -Damien
>>>>
>>>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>>>
>>>> Hi:
>>>>>
>>>>> I'm trying to test the replication process with two local  
>>>>> database and I
>>>>> found that replication process don't work as it should (or as I  
>>>>> think it
>>>>> would)
>>>>>
>>>>> The case:
>>>>>
>>>>> 1º Create a db called t2.
>>>>> 2º Create a document called terminator.
>>>>> 3º Add a property to the document, so that makes a new revision,  
>>>>> with a
>>>>> property called speed and the value 1
>>>>> 4º Create a new db called t3.
>>>>> 5º Launch replication process from t2 to t3.
>>>>>
>>>>> In t3 should be a document with two revisions, and if I point to
>>>>> "t3/terminator?revs=true" appears two revisions. If I try to get  
>>>>> the
>>>>> last
>>>>> revision it works as it should but If I try to get the first  
>>>>> revision
>>>>> (the
>>>>> one without properties) I get a "not found" message.
>>>>>
>>>>> In t2 database, this works without problems so I think that is a  
>>>>> problem
>>>>> with replication.
>>>>>
>>>>> I've tried with debian , with the lastest in the web (0.8.1),  
>>>>> and the
>>>>> trunk
>>>>> svn version with the same results.
>>>>>
>>>>> Anyone could help me or the terminator will kill me? :-)
>>>>>
>>>>> Thanks in advance
>>>>>
>>>>> Manolo Padrón Martínez
>>>>>
>>>>
>>>>
>>>
>>

Re: Fail on a simple case on replication

Posted by Patrick Antivackis <pa...@gmail.com>.

For a reminder :

revision  (n)
1. the act or process of revising,
2. a corrected or new version of a book, article, etc.

For me this term is correct with the use in Couch

I think a good explanation of what a compaction/replication are doing (ie
removing  old rev, or replicating only current rev) is the right solution to
this misunderstanding

- Remove the ability to get old revisions
>>
>
-1 : This functionnality is interesting for some case studies

 - Make it much harder/verbose to get old revision


-1 : I don't see the utility of this

- Make the api to get old revisions something like
>> "?old_rev_that_might_still_be_on_disk=...."
>
>
0 :


> - Don't call them revisions, call them "turd blossoms" or "hobo socks".
>> People won't know what they are, but at least they won't misuse them.
>>
>
-1 : revision seems the right term to me


>
>
>
>
>> -Damien
>>
>> Begin forwarded message:
>>
>>  From: Damien Katz <da...@apache.org>
>>> Date: February 23, 2009 9:09:09 AM EST
>>> To: user@couchdb.apache.org
>>> Subject: Re: Fail on a simple case on replication
>>> Reply-To: user@couchdb.apache.org
>>>
>>> Revisions are made available as a convenience, but CouchDB doesn't
>>> replicate old revisions, only the most recent. Also compaction will remove
>>> old revisions as well.
>>>
>>> -Damien
>>>
>>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>>
>>>  Hi:
>>>>
>>>> I'm trying to test the replication process with two local database and I
>>>> found that replication process don't work as it should (or as I think it
>>>> would)
>>>>
>>>> The case:
>>>>
>>>> 1º Create a db called t2.
>>>> 2º Create a document called terminator.
>>>> 3º Add a property to the document, so that makes a new revision, with a
>>>> property called speed and the value 1
>>>> 4º Create a new db called t3.
>>>> 5º Launch replication process from t2 to t3.
>>>>
>>>> In t3 should be a document with two revisions, and if I point to
>>>> "t3/terminator?revs=true" appears two revisions. If I try to get the
>>>> last
>>>> revision it works as it should but If I try to get the first revision
>>>> (the
>>>> one without properties) I get a "not found" message.
>>>>
>>>> In t2 database, this works without problems so I think that is a problem
>>>> with replication.
>>>>
>>>> I've tried with debian , with the lastest in the web (0.8.1), and the
>>>> trunk
>>>> svn version with the same results.
>>>>
>>>> Anyone could help me or the terminator will kill me? :-)
>>>>
>>>> Thanks in advance
>>>>
>>>> Manolo Padrón Martínez
>>>>
>>>
>>>
>>
>

Re: Fail on a simple case on replication

Posted by Robert Dionne <di...@dionne-associates.com>.


On Feb 23, 2009, at 9:16 AM, Damien Katz wrote:

> This is a very common misconception about the revision system. Any  
> ideas how we can make this better?
>
> random ideas:
> - Remove the ability to get old revisions

+1

> - Make it much harder/verbose to get old revision
> - Make the api to get old revisions something like "? 
> old_rev_that_might_still_be_on_disk=...."
> - Don't call them revisions, call them "turd blossoms" or "hobo  
> socks". People won't know what they are, but at least they won't  
> misuse them.

+1 change _rev to _internal_id


>
> -Damien
>
> Begin forwarded message:
>
>> From: Damien Katz <da...@apache.org>
>> Date: February 23, 2009 9:09:09 AM EST
>> To: user@couchdb.apache.org
>> Subject: Re: Fail on a simple case on replication
>> Reply-To: user@couchdb.apache.org
>>
>> Revisions are made available as a convenience, but CouchDB doesn't  
>> replicate old revisions, only the most recent. Also compaction  
>> will remove old revisions as well.
>>
>> -Damien
>>
>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>
>>> Hi:
>>>
>>> I'm trying to test the replication process with two local  
>>> database and I
>>> found that replication process don't work as it should (or as I  
>>> think it
>>> would)
>>>
>>> The case:
>>>
>>> 1º Create a db called t2.
>>> 2º Create a document called terminator.
>>> 3º Add a property to the document, so that makes a new revision,  
>>> with a
>>> property called speed and the value 1
>>> 4º Create a new db called t3.
>>> 5º Launch replication process from t2 to t3.
>>>
>>> In t3 should be a document with two revisions, and if I point to
>>> "t3/terminator?revs=true" appears two revisions. If I try to get  
>>> the last
>>> revision it works as it should but If I try to get the first  
>>> revision (the
>>> one without properties) I get a "not found" message.
>>>
>>> In t2 database, this works without problems so I think that is a  
>>> problem
>>> with replication.
>>>
>>> I've tried with debian , with the lastest in the web (0.8.1), and  
>>> the trunk
>>> svn version with the same results.
>>>
>>> Anyone could help me or the terminator will kill me? :-)
>>>
>>> Thanks in advance
>>>
>>> Manolo Padrón Martínez
>>
>

Re: Fail on a simple case on replication

Posted by Ulises <ul...@gmail.com>.

> - Don't call them revisions, call them "turd blossoms" or "hobo socks".
> People won't know what they are, but at least they won't misuse them.

+1 as revision is too tied up to CVS and friends. I'm no so sure about
turd blossoms though ;)

U

Re: Fail on a simple case on replication

Posted by Chris Anderson <jc...@apache.org>.

On Mon, Feb 23, 2009 at 6:16 AM, Damien Katz <da...@apache.org> wrote:
> This is a very common misconception about the revision system. Any ideas how
> we can make this better?
>
> random ideas:
> - Don't call them revisions, call them "turd blossoms" or "hobo socks".
> People won't know what they are, but at least they won't misuse them.
>

+1 for _mvcc (or longer) _mvcc_token

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Fail on a simple case on replication

Posted by Jan Lehnardt <ja...@apache.org>.

Hi,

On 23 Feb 2009, at 15:16, Damien Katz wrote:

> This is a very common misconception about the revision system. Any  
> ideas how we can make this better?
>
> random ideas:
> - Remove the ability to get old revisions
> - Make it much harder/verbose to get old revision
> - Make the api to get old revisions something like "? 
> old_rev_that_might_still_be_on_disk=...."
> - Don't call them revisions, call them "turd blossoms" or "hobo  
> socks". People won't know what they are, but at least they won't  
> misuse them.

I like to think of the _rev as an access token the user has to
provide when attempting a write. So _token would be an idea
or something else along these lines.

Since we are only operating on the "previous" revision for this,
we could also name it _prev.


Cheers
Jan
--


>
>
> -Damien
>
> Begin forwarded message:
>
>> From: Damien Katz <da...@apache.org>
>> Date: February 23, 2009 9:09:09 AM EST
>> To: user@couchdb.apache.org
>> Subject: Re: Fail on a simple case on replication
>> Reply-To: user@couchdb.apache.org
>>
>> Revisions are made available as a convenience, but CouchDB doesn't  
>> replicate old revisions, only the most recent. Also compaction will  
>> remove old revisions as well.
>>
>> -Damien
>>
>> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>>
>>> Hi:
>>>
>>> I'm trying to test the replication process with two local database  
>>> and I
>>> found that replication process don't work as it should (or as I  
>>> think it
>>> would)
>>>
>>> The case:
>>>
>>> 1º Create a db called t2.
>>> 2º Create a document called terminator.
>>> 3º Add a property to the document, so that makes a new revision,  
>>> with a
>>> property called speed and the value 1
>>> 4º Create a new db called t3.
>>> 5º Launch replication process from t2 to t3.
>>>
>>> In t3 should be a document with two revisions, and if I point to
>>> "t3/terminator?revs=true" appears two revisions. If I try to get  
>>> the last
>>> revision it works as it should but If I try to get the first  
>>> revision (the
>>> one without properties) I get a "not found" message.
>>>
>>> In t2 database, this works without problems so I think that is a  
>>> problem
>>> with replication.
>>>
>>> I've tried with debian , with the lastest in the web (0.8.1), and  
>>> the trunk
>>> svn version with the same results.
>>>
>>> Anyone could help me or the terminator will kill me? :-)
>>>
>>> Thanks in advance
>>>
>>> Manolo Padrón Martínez
>>
>
>

Fwd: Fail on a simple case on replication

Posted by Damien Katz <da...@apache.org>.

This is a very common misconception about the revision system. Any  
ideas how we can make this better?

random ideas:
- Remove the ability to get old revisions
- Make it much harder/verbose to get old revision
- Make the api to get old revisions something like "? 
old_rev_that_might_still_be_on_disk=...."
- Don't call them revisions, call them "turd blossoms" or "hobo  
socks". People won't know what they are, but at least they won't  
misuse them.

-Damien

Begin forwarded message:

> From: Damien Katz <da...@apache.org>
> Date: February 23, 2009 9:09:09 AM EST
> To: user@couchdb.apache.org
> Subject: Re: Fail on a simple case on replication
> Reply-To: user@couchdb.apache.org
>
> Revisions are made available as a convenience, but CouchDB doesn't  
> replicate old revisions, only the most recent. Also compaction will  
> remove old revisions as well.
>
> -Damien
>
> On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:
>
>> Hi:
>>
>> I'm trying to test the replication process with two local database  
>> and I
>> found that replication process don't work as it should (or as I  
>> think it
>> would)
>>
>> The case:
>>
>> 1º Create a db called t2.
>> 2º Create a document called terminator.
>> 3º Add a property to the document, so that makes a new revision,  
>> with a
>> property called speed and the value 1
>> 4º Create a new db called t3.
>> 5º Launch replication process from t2 to t3.
>>
>> In t3 should be a document with two revisions, and if I point to
>> "t3/terminator?revs=true" appears two revisions. If I try to get  
>> the last
>> revision it works as it should but If I try to get the first  
>> revision (the
>> one without properties) I get a "not found" message.
>>
>> In t2 database, this works without problems so I think that is a  
>> problem
>> with replication.
>>
>> I've tried with debian , with the lastest in the web (0.8.1), and  
>> the trunk
>> svn version with the same results.
>>
>> Anyone could help me or the terminator will kill me? :-)
>>
>> Thanks in advance
>>
>> Manolo Padrón Martínez
>

Re: Fail on a simple case on replication

Posted by Damien Katz <da...@apache.org>.

Revisions are made available as a convenience, but CouchDB doesn't  
replicate old revisions, only the most recent. Also compaction will  
remove old revisions as well.

-Damien

On Feb 23, 2009, at 9:00 AM, Manolo Padron Martinez wrote:

> Hi:
>
> I'm trying to test the replication process with two local database  
> and I
> found that replication process don't work as it should (or as I  
> think it
> would)
>
> The case:
>
> 1º Create a db called t2.
> 2º Create a document called terminator.
> 3º Add a property to the document, so that makes a new revision,  
> with a
> property called speed and the value 1
> 4º Create a new db called t3.
> 5º Launch replication process from t2 to t3.
>
> In t3 should be a document with two revisions, and if I point to
> "t3/terminator?revs=true" appears two revisions. If I try to get the  
> last
> revision it works as it should but If I try to get the first  
> revision (the
> one without properties) I get a "not found" message.
>
> In t2 database, this works without problems so I think that is a  
> problem
> with replication.
>
> I've tried with debian , with the lastest in the web (0.8.1), and  
> the trunk
> svn version with the same results.
>
> Anyone could help me or the terminator will kill me? :-)
>
> Thanks in advance
>
> Manolo Padrón Martínez