You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Antony Blakey <an...@gmail.com> on 2009/01/05 14:26:25 UTC
Proposal: Extending immutability
I've cc'd this to couchdb-user, because I think this discussion
belongs on -dev, but everyone watches -user.
One of the great features of Couch is the use of optimistic locking
i.e. rev as a bedrock mechanism, and the way this is permeated through
the API. The combination of id + rev is a reference to an immutable
value (with some caveats, one subject of this proposal). This means
that you get caching for free. By keying off id + rev, you can cache
the document along with any (functional) derived values. Additionally
you can trivially memoize functions of multiple documents using that
mechanism.
I use this to good effect in my application, where I aggressively
cache the documents (which are sometimes large) and therefore don't
need the document content in queries. To take advantage of this
however this means that my views need to include the _rev as the
value, and transformation that would normally happen in the map
happens in the client.
It would be very useful to have the rev returned wherever an id is
returned, specifically in view results. You could then use a view
without include_docs, and get the ids and revs. You can keep a cache
(per view, pre db) of the results. The actual view results only need
to be fetched on a cache miss, which can be driven by the cache
machinery.
The nice thing is that all of this caching machinery can be
transparently interposed. Except when the view definition is changed.
So I also propose to have the rev and id of the design doc returned in
the view results. And for completeness, every database should be
assigned a UUID when it is created. This UUID should be provided in
the dbinfo, and for every view and view-like result.
This means that from every view result you can construct a list of
universally unique references to immutable values e.g. DB UUID + (View
id + rev) + (Document id + rev). A form of referential transparency -
and with a cache and a little bit of 100% generic machinery, it can be
true referential transparency. Clients don't have to watch/be notified
about changes to design docs, or even database creation/deletion.
Systemwide transparent caching in particular becomes trivial.
So, in summary I propose:
1. Provide the document rev whenever the id is returned, such as view
results i.e. not in the document, but in the per-row hash.
2. Provide the design document id and rev in view results i.e. in the
top level hash.
3. Add a UUID to databases, and provide that in view results i.e. in
the top level hash, and all other database operation results.
I think you could do this even with reduce results, but I haven't
though a lot about it.
I think this generalised the current API in a very useful way, that
will greatly simplify, and hence 'robustify' client code. Although I
haven't checked the implementation code, my experience so far suggest
this isn't difficult.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Every task involves constraint,
Solve the thing without complaint;
There are magic links and chains
Forged to loose our rigid brains.
Structures, structures, though they bind,
Strangely liberate the mind.
-- James Fallen
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 12/01/2009, at 10:26 PM, Noah Slater wrote:
> On Mon, Jan 12, 2009 at 05:15:39PM +1030, Antony Blakey wrote:
>> --- Revised Proposal ---------
>>
>> Each document, whether canonical or derived, has a globally unique
>> identity consisting of a UUID and the document ID.
>>
>> In the case of a canonical document, the UUID is the UUID of the
>> database (or cluster), which is assigned when a database is created.
>>
>> In the case of a (derived) view map result, it is the UUID of the map
>> function (not the design doc), which is assigned to each map function
>> (i.e. view) in a design doc when the design doc is created or
>> updated.
>>
>> Furthermore, there is a triple {UUID, document id, document rev} that
>> globally identifies a document at a given point in time. The key
>> characteristic being that a {UUID, id, rev} identifies an immutable
>> value.
For views it is slightly more complex. At a first blush you need to
include the key because a given map function can emit multiple view
rows per document. But from a theoretical standpoint it's worse than
that, because the map function can emit multiple view rows with
identical keys but different values. This is a pathological scenario
however, and IMO a generic middleware that doesn't deal with that
situation is still extraordinarily useful.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A reasonable man adapts himself to suit his environment. An
unreasonable man persists in attempting to adapt his environment to
suit himself. Therefore, all progress depends on the unreasonable man.
-- George Bernard Shaw
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 12/01/2009, at 10:26 PM, Noah Slater wrote:
> On Mon, Jan 12, 2009 at 05:15:39PM +1030, Antony Blakey wrote:
>> --- Revised Proposal ---------
>>
>> Each document, whether canonical or derived, has a globally unique
>> identity consisting of a UUID and the document ID.
>>
>> In the case of a canonical document, the UUID is the UUID of the
>> database (or cluster), which is assigned when a database is created.
>>
>> In the case of a (derived) view map result, it is the UUID of the map
>> function (not the design doc), which is assigned to each map function
>> (i.e. view) in a design doc when the design doc is created or
>> updated.
>>
>> Furthermore, there is a triple {UUID, document id, document rev} that
>> globally identifies a document at a given point in time. The key
>> characteristic being that a {UUID, id, rev} identifies an immutable
>> value.
>
> Why use UUID like this? Why not let the database (&c) name suffice?
Because if the database is deleted and the name reused, intermediaries
can't tell. We presume that's not likely, but in a world of database-
per-user (as discussed on IRC recently), which could conceivable be
extended to domains other than users, who nows what the future might
bring. The same goes for _externals wanting to invalidate secondary
indexes in such a scenario.
i.e. the UUID of a database represents a context in which (id + rev)
is immutable. If you use the db name, that's no longer true because
you no longer have a globally unique namespace.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Isn't it enough to see that a garden is beautiful without having to
believe that there are fairies at the bottom of it too?
-- Douglas Adams
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 12/01/2009, at 10:26 PM, Noah Slater wrote:
> On Mon, Jan 12, 2009 at 05:15:39PM +1030, Antony Blakey wrote:
>> --- Revised Proposal ---------
>>
>> Each document, whether canonical or derived, has a globally unique
>> identity consisting of a UUID and the document ID.
>>
>> In the case of a canonical document, the UUID is the UUID of the
>> database (or cluster), which is assigned when a database is created.
>>
>> In the case of a (derived) view map result, it is the UUID of the map
>> function (not the design doc), which is assigned to each map function
>> (i.e. view) in a design doc when the design doc is created or
>> updated.
>>
>> Furthermore, there is a triple {UUID, document id, document rev} that
>> globally identifies a document at a given point in time. The key
>> characteristic being that a {UUID, id, rev} identifies an immutable
>> value.
For views it is slightly more complex. At a first blush you need to
include the key because a given map function can emit multiple view
rows per document. But from a theoretical standpoint it's worse than
that, because the map function can emit multiple view rows with
identical keys but different values. This is a pathological scenario
however, and IMO a generic middleware that doesn't deal with that
situation is still extraordinarily useful.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A reasonable man adapts himself to suit his environment. An
unreasonable man persists in attempting to adapt his environment to
suit himself. Therefore, all progress depends on the unreasonable man.
-- George Bernard Shaw
Re: Proposal: Extending immutability
Posted by Noah Slater <ns...@apache.org>.
On Mon, Jan 12, 2009 at 05:15:39PM +1030, Antony Blakey wrote:
> --- Revised Proposal ---------
>
> Each document, whether canonical or derived, has a globally unique
> identity consisting of a UUID and the document ID.
>
> In the case of a canonical document, the UUID is the UUID of the
> database (or cluster), which is assigned when a database is created.
>
> In the case of a (derived) view map result, it is the UUID of the map
> function (not the design doc), which is assigned to each map function
> (i.e. view) in a design doc when the design doc is created or updated.
>
> Furthermore, there is a triple {UUID, document id, document rev} that
> globally identifies a document at a given point in time. The key
> characteristic being that a {UUID, id, rev} identifies an immutable
> value.
Why use UUID like this? Why not let the database (&c) name suffice?
--
Noah Slater, http://tumbolia.org/nslater
Re: Proposal: Extending immutability
Posted by Noah Slater <ns...@apache.org>.
On Mon, Jan 12, 2009 at 05:15:39PM +1030, Antony Blakey wrote:
> --- Revised Proposal ---------
>
> Each document, whether canonical or derived, has a globally unique
> identity consisting of a UUID and the document ID.
>
> In the case of a canonical document, the UUID is the UUID of the
> database (or cluster), which is assigned when a database is created.
>
> In the case of a (derived) view map result, it is the UUID of the map
> function (not the design doc), which is assigned to each map function
> (i.e. view) in a design doc when the design doc is created or updated.
>
> Furthermore, there is a triple {UUID, document id, document rev} that
> globally identifies a document at a given point in time. The key
> characteristic being that a {UUID, id, rev} identifies an immutable
> value.
Why use UUID like this? Why not let the database (&c) name suffice?
--
Noah Slater, http://tumbolia.org/nslater
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
--- Revised Proposal ---------
Each document, whether canonical or derived, has a globally unique
identity consisting of a UUID and the document ID.
In the case of a canonical document, the UUID is the UUID of the
database (or cluster), which is assigned when a database is created.
In the case of a (derived) view map result, it is the UUID of the map
function (not the design doc), which is assigned to each map function
(i.e. view) in a design doc when the design doc is created or updated.
Furthermore, there is a triple {UUID, document id, document rev} that
globally identifies a document at a given point in time. The key
characteristic being that a {UUID, id, rev} identifies an immutable
value.
------------------------------
I have a real-life use-case for this.
My project stores a number of XHTML documents that use microformat
markup. The documents rarely change, but they can. I have views that
provide derived documents using E4X to reflect on the microformat
markup. In my client I produce further derived results from the view
values.
Each web page query requires the in-client derived results from a
number of these documents, which come in from a view query. The ideal
situation would be if I could query the view, omitting the value
(minor detail, but potentially beneficial), and receive the key,
document id, document rev, and a UUID as described below, that
globally qualifies the document id.
Thus I could easily cache my derived results, knowing that I have
value-based cache keys. Furthermore I can easily cache functional
combinations of such identified fragments, using simple multi-key memo-
ization.
I can build this as 100% generic caching/transformation middleware
that allows me to register functional transformations, as long as
couch provides the appropriate details independent of the structure of
the value returned from the view.
I can't rely on etags, because they are dependent on the view query
parameters e.g. start/end keys.
I don't want to put the _rev into the view result - it doesn't belong
there because it's not part of the domain data, and to do so is a
hack. My view results are not structured.
I don't want to have to hook into a notification mechanism to detect
design doc and database changes. The design docs can change when new
versions of the software is deployed into a running system. The system
shouldn't have to restart in this situation.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A Man may make a Remark –
In itself – a quiet thing
That may furnish the Fuse unto a Spark
In dormant nature – lain –
Let us divide – with skill –
Let us discourse – with care –
Powder exists in Charcoal –
Before it exists in Fire –
-– Emily Dickinson 913 (1865)
Re: Proposal: Extending immutability
Posted by Randall Leeds <ra...@gmail.com>.
On Mon, Jan 5, 2009 at 17:25, Antony Blakey <an...@gmail.com> wrote:
>
> But then you need to get the view value rather than just the row metadata.
> I propose including the rev in the view result row so that it appears even
> if the value doesn't. Furthermore it means that you can write (third-party)
> generic interposition software without requiring that your views have a
> certain value structure.
>
+1, but see caveat below
I disagree - etags is a property of the transport protocol, and is at the
> wrong granularity. If all but one document in the view result hasn't changed
> then a transparent document cache would have no way of knowing if all of the
> documents have changed or just one. This affects memoization, which isn't
> based on view window requests.
This is what I was saying in my last post. Etags are good (and we should
support them), but they are not very granular in this case.
But I'm suggesting per-document / pre-view-row immutable-value-identity,
> which is quite different than per-view-window caching.
I'm now thinking about what you're saying here, Antony, and I think I might
be starting to agree. I was going to suggest that perhaps generating the
etag based on the _rev of each document included, uuid of database, design
doc seq #, etc might accomplish what you want. However, even though I can't
see a definite use case immediately, I can see the logic behind the
transparency you speak of.
I like it.
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 06/01/2009, at 5:51 AM, Chris Anderson wrote:
> Currently you have the freedom to add the rev to the value if you need
> it. I think we can get the semantics you desire without adding more
> fields to the view results.
But then you need to get the view value rather than just the row
metadata. I propose including the rev in the view result row so that
it appears even if the value doesn't. Furthermore it means that you
can write (third-party) generic interposition software without
requiring that your views have a certain value structure.
> On the "roadmap", and actually within striking distance, it is planned
> that CouchDB will return Etags with view responses. We could throw all
> the information needed to calculate an Etag into the response JSON
> (and there might be other compelling reasons to do so) but for the
> sake of caching, I think getting the Etags right is the thing to
> concentrate on.
I disagree - etags is a property of the transport protocol, and is at
the wrong granularity. If all but one document in the view result
hasn't changed then a transparent document cache would have no way of
knowing if all of the documents have changed or just one. This affects
memoization, which isn't based on view window requests.
> I think that with proper use of Etags, the JSON content of the views
> becomes irrelevant for their cacheability.
But I'm suggesting per-document / pre-view-row immutable-value-
identity, which is quite different than per-view-window caching.
This isn't just about caching - I'm suggesting the adoption of a
theoretical model that will have benefits both immediately obvious,
and not yet evident.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
It is no measure of health to be well adjusted to a profoundly sick
society.
-- Jiddu Krishnamurti
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
--- Revised Proposal ---------
Each document, whether canonical or derived, has a globally unique
identity consisting of a UUID and the document ID.
In the case of a canonical document, the UUID is the UUID of the
database (or cluster), which is assigned when a database is created.
In the case of a (derived) view map result, it is the UUID of the map
function (not the design doc), which is assigned to each map function
(i.e. view) in a design doc when the design doc is created or updated.
Furthermore, there is a triple {UUID, document id, document rev} that
globally identifies a document at a given point in time. The key
characteristic being that a {UUID, id, rev} identifies an immutable
value.
------------------------------
I have a real-life use-case for this.
My project stores a number of XHTML documents that use microformat
markup. The documents rarely change, but they can. I have views that
provide derived documents using E4X to reflect on the microformat
markup. In my client I produce further derived results from the view
values.
Each web page query requires the in-client derived results from a
number of these documents, which come in from a view query. The ideal
situation would be if I could query the view, omitting the value
(minor detail, but potentially beneficial), and receive the key,
document id, document rev, and a UUID as described below, that
globally qualifies the document id.
Thus I could easily cache my derived results, knowing that I have
value-based cache keys. Furthermore I can easily cache functional
combinations of such identified fragments, using simple multi-key memo-
ization.
I can build this as 100% generic caching/transformation middleware
that allows me to register functional transformations, as long as
couch provides the appropriate details independent of the structure of
the value returned from the view.
I can't rely on etags, because they are dependent on the view query
parameters e.g. start/end keys.
I don't want to put the _rev into the view result - it doesn't belong
there because it's not part of the domain data, and to do so is a
hack. My view results are not structured.
I don't want to have to hook into a notification mechanism to detect
design doc and database changes. The design docs can change when new
versions of the software is deployed into a running system. The system
shouldn't have to restart in this situation.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A Man may make a Remark –
In itself – a quiet thing
That may furnish the Fuse unto a Spark
In dormant nature – lain –
Let us divide – with skill –
Let us discourse – with care –
Powder exists in Charcoal –
Before it exists in Fire –
-– Emily Dickinson 913 (1865)
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 06/01/2009, at 8:19 AM, Randall Leeds wrote:
> Is there a way we can get more fine-grained cache efficiency here?
> There
> might be complications with reduce (I'll try to think through it
> more). If
> this is the case then maybe this is not a priority for Couch. If an
> individual application knows that this sort of check with _rev on each
> document allows for some extra client-side caching without negative
> side
> effects maybe it should be left to the application.
If there is some easy way of extending the principal of references to
an immutable values, that enables 100% generic client libraries or
interpositioning caching software, then this IMO is a better idea than
leaving it to the application.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The trouble with the world is that the stupid are cocksure and the
intelligent are full of doubt.
-- Bertrand Russell
Re: Proposal: Extending immutability
Posted by Randall Leeds <ra...@gmail.com>.
> I've mentioned this before, but I'll restate what I think the
> necessary inputs to an Etag generation function are:
>
> The view function(s) source code.
>
The _rev of the design doc makes more sense to my brain. Is there a
technical argument either way? I don't immediately see one.
> The seq # of the last db-update that changed the view index. This
> should be somewhat straightforward to pull from the view index. Until
> we are able to do that, we can get by with the current DB seq # at a
> loss of cache efficiency.
I think there might be some extra cache-ability to be obtained by including
the _rev in the the view result rows. For example, this seq number would
change when a new document is indexed by the view but a particular
[startkey,endkey] request might not include that document. In the case of a
straight map without reduce we would have a different etag but not really a
different result.
Is there a way we can get more fine-grained cache efficiency here? There
might be complications with reduce (I'll try to think through it more). If
this is the case then maybe this is not a priority for Couch. If an
individual application knows that this sort of check with _rev on each
document allows for some extra client-side caching without negative side
effects maybe it should be left to the application.
-Randall
Re: Proposal: Extending immutability
Posted by Chris Anderson <jc...@gmail.com>.
On Mon, Jan 5, 2009 at 5:26 AM, Antony Blakey <an...@gmail.com> wrote:
>
> 1. Provide the document rev whenever the id is returned, such as view
> results i.e. not in the document, but in the per-row hash.
Currently you have the freedom to add the rev to the value if you need
it. I think we can get the semantics you desire without adding more
fields to the view results.
>
> I think this generalised the current API in a very useful way, that will
> greatly simplify, and hence 'robustify' client code. Although I haven't
> checked the implementation code, my experience so far suggest this isn't
> difficult.
>
On the "roadmap", and actually within striking distance, it is planned
that CouchDB will return Etags with view responses. We could throw all
the information needed to calculate an Etag into the response JSON
(and there might be other compelling reasons to do so) but for the
sake of caching, I think getting the Etags right is the thing to
concentrate on.
I've mentioned this before, but I'll restate what I think the
necessary inputs to an Etag generation function are:
The view function(s) source code.
The seq # of the last db-update that changed the view index. This
should be somewhat straightforward to pull from the view index. Until
we are able to do that, we can get by with the current DB seq # at a
loss of cache efficiency.
The database UUID - normally the fact the each DB has it's own URL
would suffice, but there are some edge cases where databases could be
renamed or replaced. DBs don't have UUIDs yet, so this is something we
can add to the Etag when we add per-DB UUIDs.
I think that with proper use of Etags, the JSON content of the views
becomes irrelevant for their cacheability.
--
Chris Anderson
http://jchris.mfdz.com
Re: Proposal: Extending immutability
Posted by Robert Dionne <bo...@gmail.com>.
I guess I can answer my own question, Chris' app in the database[1]
idea is a scenario where a user might clone the app then tweak the
design docs.
[1] http://jchris.mfdz.com/code/2008/11/my_couch_or_yours__shareable_ap
On Jan 5, 2009, at 1:27 PM, Robert Dionne wrote:
> Interesting proposal. A question:
>
> First, I'm a little confused by the use of the phrase referential
> transparency as I understand the more technical definition in the
> FP literature (call a function twice on the same input and get the
> same output), but I think I see the intended meaning, please
> clarify if you have another meaning.
>
> Do you have applications and/or envision use cases that are
> dynamic enough that you want to track design doc changes? It seems
> to me these are more a development time concern.
>
>
>
>
> On Jan 5, 2009, at 8:26 AM, Antony Blakey wrote:
>
>> I've cc'd this to couchdb-user, because I think this discussion
>> belongs on -dev, but everyone watches -user.
>>
>> One of the great features of Couch is the use of optimistic
>> locking i.e. rev as a bedrock mechanism, and the way this is
>> permeated through the API. The combination of id + rev is a
>> reference to an immutable value (with some caveats, one subject of
>> this proposal). This means that you get caching for free. By
>> keying off id + rev, you can cache the document along with any
>> (functional) derived values. Additionally you can trivially
>> memoize functions of multiple documents using that mechanism.
>>
>> I use this to good effect in my application, where I aggressively
>> cache the documents (which are sometimes large) and therefore
>> don't need the document content in queries. To take advantage of
>> this however this means that my views need to include the _rev as
>> the value, and transformation that would normally happen in the
>> map happens in the client.
>>
>> It would be very useful to have the rev returned wherever an id is
>> returned, specifically in view results. You could then use a view
>> without include_docs, and get the ids and revs. You can keep a
>> cache (per view, pre db) of the results. The actual view results
>> only need to be fetched on a cache miss, which can be driven by
>> the cache machinery.
>>
>> The nice thing is that all of this caching machinery can be
>> transparently interposed. Except when the view definition is
>> changed. So I also propose to have the rev and id of the design
>> doc returned in the view results. And for completeness, every
>> database should be assigned a UUID when it is created. This UUID
>> should be provided in the dbinfo, and for every view and view-like
>> result.
>>
>> This means that from every view result you can construct a list of
>> universally unique references to immutable values e.g. DB UUID +
>> (View id + rev) + (Document id + rev). A form of referential
>> transparency - and with a cache and a little bit of 100% generic
>> machinery, it can be true referential transparency. Clients don't
>> have to watch/be notified about changes to design docs, or even
>> database creation/deletion. Systemwide transparent caching in
>> particular becomes trivial.
>>
>> So, in summary I propose:
>>
>> 1. Provide the document rev whenever the id is returned, such as
>> view results i.e. not in the document, but in the per-row hash.
>>
>> 2. Provide the design document id and rev in view results i.e. in
>> the top level hash.
>>
>> 3. Add a UUID to databases, and provide that in view results i.e.
>> in the top level hash, and all other database operation results.
>>
>> I think you could do this even with reduce results, but I haven't
>> though a lot about it.
>>
>> I think this generalised the current API in a very useful way,
>> that will greatly simplify, and hence 'robustify' client code.
>> Although I haven't checked the implementation code, my experience
>> so far suggest this isn't difficult.
>>
>> Antony Blakey
>> -------------
>> CTO, Linkuistics Pty Ltd
>> Ph: 0438 840 787
>>
>> Every task involves constraint,
>> Solve the thing without complaint;
>> There are magic links and chains
>> Forged to loose our rigid brains.
>> Structures, structures, though they bind,
>> Strangely liberate the mind.
>> -- James Fallen
>>
>>
>
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 06/01/2009, at 4:57 AM, Robert Dionne wrote:
> First, I'm a little confused by the use of the phrase referential
> transparency as I understand the more technical definition in the FP
> literature (call a function twice on the same input and get the same
> output), but I think I see the intended meaning, please clarify if
> you have another meaning.
I'm using referential transparency in a conceptual manner - the idea
that the id + rev can be substituted for the document because the
reference is to an immutable value, so doc1 = doc2 is equivalent to
(id + rev)1 = (id + rev)2. If that equivalence holds even when design
docs and/or databases change (my proposal) then you can make stronger
claims about correctness, which enables simpler and more generic
reusable client facilities.
The property would be even more powerful if the rev was e.g. a hash of
the document. In fact, probabilistically this would be a substitute
for extending the document identity with view and database identities.
> Do you have applications and/or envision use cases that are dynamic
> enough that you want to track design doc changes? It seems to me
> these are more a development time concern.
I know you subsequently answered this, but the issue is somewhat more
theoretical - if we extend immutability in this fashion then there are
simplifying benefits that allows such dynamic applications, even if we
can't think of them all now.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A reasonable man adapts himself to suit his environment. An
unreasonable man persists in attempting to adapt his environment to
suit himself. Therefore, all progress depends on the unreasonable man.
-- George Bernard Shaw
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 06/01/2009, at 4:57 AM, Robert Dionne wrote:
> First, I'm a little confused by the use of the phrase referential
> transparency as I understand the more technical definition in the FP
> literature (call a function twice on the same input and get the same
> output), but I think I see the intended meaning, please clarify if
> you have another meaning.
I'm using referential transparency in a conceptual manner - the idea
that the id + rev can be substituted for the document because the
reference is to an immutable value, so doc1 = doc2 is equivalent to
(id + rev)1 = (id + rev)2. If that equivalence holds even when design
docs and/or databases change (my proposal) then you can make stronger
claims about correctness, which enables simpler and more generic
reusable client facilities.
The property would be even more powerful if the rev was e.g. a hash of
the document. In fact, probabilistically this would be a substitute
for extending the document identity with view and database identities.
> Do you have applications and/or envision use cases that are dynamic
> enough that you want to track design doc changes? It seems to me
> these are more a development time concern.
I know you subsequently answered this, but the issue is somewhat more
theoretical - if we extend immutability in this fashion then there are
simplifying benefits that allows such dynamic applications, even if we
can't think of them all now.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
A reasonable man adapts himself to suit his environment. An
unreasonable man persists in attempting to adapt his environment to
suit himself. Therefore, all progress depends on the unreasonable man.
-- George Bernard Shaw
Re: Proposal: Extending immutability
Posted by Robert Dionne <bo...@gmail.com>.
Interesting proposal. A question:
First, I'm a little confused by the use of the phrase referential
transparency as I understand the more technical definition in the FP
literature (call a function twice on the same input and get the same
output), but I think I see the intended meaning, please clarify if
you have another meaning.
Do you have applications and/or envision use cases that are dynamic
enough that you want to track design doc changes? It seems to me
these are more a development time concern.
On Jan 5, 2009, at 8:26 AM, Antony Blakey wrote:
> I've cc'd this to couchdb-user, because I think this discussion
> belongs on -dev, but everyone watches -user.
>
> One of the great features of Couch is the use of optimistic locking
> i.e. rev as a bedrock mechanism, and the way this is permeated
> through the API. The combination of id + rev is a reference to an
> immutable value (with some caveats, one subject of this proposal).
> This means that you get caching for free. By keying off id + rev,
> you can cache the document along with any (functional) derived
> values. Additionally you can trivially memoize functions of
> multiple documents using that mechanism.
>
> I use this to good effect in my application, where I aggressively
> cache the documents (which are sometimes large) and therefore don't
> need the document content in queries. To take advantage of this
> however this means that my views need to include the _rev as the
> value, and transformation that would normally happen in the map
> happens in the client.
>
> It would be very useful to have the rev returned wherever an id is
> returned, specifically in view results. You could then use a view
> without include_docs, and get the ids and revs. You can keep a
> cache (per view, pre db) of the results. The actual view results
> only need to be fetched on a cache miss, which can be driven by the
> cache machinery.
>
> The nice thing is that all of this caching machinery can be
> transparently interposed. Except when the view definition is
> changed. So I also propose to have the rev and id of the design doc
> returned in the view results. And for completeness, every database
> should be assigned a UUID when it is created. This UUID should be
> provided in the dbinfo, and for every view and view-like result.
>
> This means that from every view result you can construct a list of
> universally unique references to immutable values e.g. DB UUID +
> (View id + rev) + (Document id + rev). A form of referential
> transparency - and with a cache and a little bit of 100% generic
> machinery, it can be true referential transparency. Clients don't
> have to watch/be notified about changes to design docs, or even
> database creation/deletion. Systemwide transparent caching in
> particular becomes trivial.
>
> So, in summary I propose:
>
> 1. Provide the document rev whenever the id is returned, such as
> view results i.e. not in the document, but in the per-row hash.
>
> 2. Provide the design document id and rev in view results i.e. in
> the top level hash.
>
> 3. Add a UUID to databases, and provide that in view results i.e.
> in the top level hash, and all other database operation results.
>
> I think you could do this even with reduce results, but I haven't
> though a lot about it.
>
> I think this generalised the current API in a very useful way, that
> will greatly simplify, and hence 'robustify' client code. Although
> I haven't checked the implementation code, my experience so far
> suggest this isn't difficult.
>
> Antony Blakey
> -------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> Every task involves constraint,
> Solve the thing without complaint;
> There are magic links and chains
> Forged to loose our rigid brains.
> Structures, structures, though they bind,
> Strangely liberate the mind.
> -- James Fallen
>
>
Re: Proposal: Extending immutability
Posted by Chris Anderson <jc...@gmail.com>.
On Mon, Jan 5, 2009 at 7:03 PM, Chris Anderson <jc...@gmail.com> wrote:
>
> CouchDB can't tell, without regenerating the view index, whether a
> given document update will effect a given key range of the view.
Sorry, to be clear I mean "without updating the view index" - updates
are much less expensive than a full regeneration.
--
Chris Anderson
http://jchris.mfdz.com
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 06/01/2009, at 2:37 PM, Chris Anderson wrote:
> The first gotcha I can see for the db uuid, is that for a cluster,
> which we want to be a single logical db, even if there are many nodes
> involved, would it be better to have a uuid per physical couch
> instance, or a single uuid for the cluster?
The combination of uuid + doc id + doc rev must identify an immutable
value. If the cluster provides that guarantee i.e. the illusion of a
single consistent value space, then the uuid could be per-cluster.
Presumably the cluster would have a mechanism to provide manage a
shared uuid value - if not, then it's probably failed the previous test.
You would want a shared uuid for a cluster, otherwise some of the
client benefit of the cluster is lost, because of a lack of value
sharing.
>>> , I'm still missing the obvious use case for extending them to
>>> view rows.
>>
>> Completeness. ... It's a
>> clean theoretical idea.
>>
>
> Believe me I appreciate the notion of being able to treat
> id/rev/db-uuids tuples as canonical. I'm still having a hard time
> coming up with a concrete use case that can't be accomplished without
> them.
Consider an expensive data structure that is built from a view row. I
would like to cache that structure. I can't just rely on the doc id, I
need the rev of the document that mapped to that row. I may not have
the rev in the value, or I may want to get the view row value only if
needed. I may be writing this without being in control of the view
definition.
Furthermore, I need the rev of the design document so that if the
design document were to be update then my cache would continue to work
correctly without needing to code anything. I want the id of the
design document in the view result in case I've got a URL mapper in
front of my cache that so that if that mapping changes I continue to
work. I want the uuid of the database so that my cache works correctly
if the database is recreated.
I'd like all of this to operate generically so I can publish the
mother of all functional couch caches up as a library on github.
And maybe a completely generic value-tracking cache that works in the
same way, but deals with caching resources that are functionally
derived from a number of different inputs, some documents, some view
results. When the identifiers of documents and view-results are
immutable values with referentially transparent keys, all of this is
easy. Without it, much domain-specific configuration needs to be done,
which is open to errors.
And I'm 100% sure there are uses that I haven't even imagined.
Anything that relies on immutability, referential transparency, value
tracing etc, could possibly be applicable.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Every task involves constraint,
Solve the thing without complaint;
There are magic links and chains
Forged to loose our rigid brains.
Structures, structures, though they bind,
Strangely liberate the mind.
-- James Fallen
Re: Proposal: Extending immutability
Posted by Chris Anderson <jc...@gmail.com>.
On Tue, Jan 6, 2009 at 9:22 AM, Daniel DeLeo <de...@gmail.com> wrote:
> Stop me if there's a better way to do this already, but:
These are good use cases, it's just that they are paradigmatic use
cases for form functions. (I'm working on documentation for forms
right now, but there is useful stuff in couch_tests.js if you want a
head start.)
--
Chris Anderson
http://jchris.mfdz.com
Re: Proposal: Extending immutability
Posted by Daniel DeLeo <de...@gmail.com>.
Stop me if there's a better way to do this already, but:
One possible use case for this feature could be an app with a drawing
component, maybe a powerpoint-type thing, where the images are created and
stored as SVGs, but viewed (or previewed or thumbnailed) as pngs. The app
would know to recreate the png or at least test for changes to the SVG when
couch reports an updated rev.
Another use case could be a blog using markdown, possibly with markdown
comments. The app could create and cache the HTML version for faster
access, updating only when the source documents change.
On the other hand, with view code like 'emit([doc.whatever, doc._id,
doc._rev], null)' and using include_docs when the whole document is needed,
one could accomplish this without requiring anything new from couch, albeit
in a less elegant way.
Just throwing it out there, but I think these use cases at least show how
such a feature could be valuable.
Cheers,
Dan DeLeo
On Mon, Jan 5, 2009 at 10:35 PM, Randall Leeds <ra...@gmail.com>wrote:
> On Mon, Jan 5, 2009 at 23:07, Chris Anderson <jc...@gmail.com> wrote:
>
> > Again, I'm not sure that having the rev available on the view row
> > makes a difference. The indexer cares about the keys and values, if
> > they've changed, they've changed.
> >
>
> This is true. However, it could be very expensive to compare the keys and
> values to see if they've changed. You could put the burden on the
> application developer to get around this by including some checksum in the
> view, or just leverage the immutability of CouchDB's revisions and provide
> it for free by returning the revision. I like this idea because it enables
> this blurry use case we're all sort of imagining together while keeping the
> actual map function focused on the real data that needs to be returned (as
> opposed to metadata that enables some slick performance gain).
>
> -Randall
>
Re: Proposal: Extending immutability
Posted by Randall Leeds <ra...@gmail.com>.
On Mon, Jan 5, 2009 at 23:07, Chris Anderson <jc...@gmail.com> wrote:
> Again, I'm not sure that having the rev available on the view row
> makes a difference. The indexer cares about the keys and values, if
> they've changed, they've changed.
>
This is true. However, it could be very expensive to compare the keys and
values to see if they've changed. You could put the burden on the
application developer to get around this by including some checksum in the
view, or just leverage the immutability of CouchDB's revisions and provide
it for free by returning the revision. I like this idea because it enables
this blurry use case we're all sort of imagining together while keeping the
actual map function focused on the real data that needs to be returned (as
opposed to metadata that enables some slick performance gain).
-Randall
Re: Proposal: Extending immutability
Posted by Chris Anderson <jc...@gmail.com>.
On Mon, Jan 5, 2009 at 7:30 PM, Antony Blakey <an...@gmail.com> wrote:
>
> On 06/01/2009, at 1:33 PM, Chris Anderson wrote:
>
>> You could put it in doc._db_uuid.
>
> OTOH, I guess you could make it such that it is ignored by the db when
> writing, and inserted when returning the document. And maybe as a "where did
> I get this value from" it may have some as-yet-unforeseen usefulness.
That actually sounds kinda relaxing. :)
> Currently the view machinery supplies the id of the document for each row. I
> don't understand why it can't supply the rev of the document that was passed
> to map().
I think it could easily do that. I'm just trying to understand better
what good could come of it.
> - my concern is that by having the rev reliably in the result row with the
> id so that generic machinery can act on view results independently of the
> map function.
>
OK I'm picturing a use case - for instance an external implementation
of reduce, which wants to be able to tell, by passing a key range
query through to couchdb, if it needs to recalculate it's external
reduction value for that range. Functionally this is the same as
per-key-range etags (which the current view infrastructure doesn't
provide, because it would have to run the full view query in order to
provide them).
I can see that for an external indexer, such key-range etags could be
of real utility, even if they aren't particularly helpful to CouchDB
itself. Also, it does seem that the easiest way to calculate such an
etag is by having the view range in question available in the external
indexer. Eg - the indexer will have to query couchdb's map-view to see
if it has to recalculate a portion of it's result.
Again, I'm not sure that having the rev available on the view row
makes a difference. The indexer cares about the keys and values, if
they've changed, they've changed.
>> So while the id/rev => doc-state equivalences are interesting (and
>> handy)
>
> Much handier if made globally unique with the db uuid.
>
I mostly agree.
The first gotcha I can see for the db uuid, is that for a cluster,
which we want to be a single logical db, even if there are many nodes
involved, would it be better to have a uuid per physical couch
instance, or a single uuid for the cluster?
>> , I'm still missing the obvious use case for extending them to
>> view rows.
>
> Completeness. ... It's a
> clean theoretical idea.
>
Believe me I appreciate the notion of being able to treat
id/rev/db-uuids tuples as canonical. I'm still having a hard time
coming up with a concrete use case that can't be accomplished without
them.
I've been around Couch long enough to know that the things that are
"missing" from CouchDB are often missing for a reason. I can't say
whether this is one of those cases.
I'm still curious for the killer app this feature enables.
--
Chris Anderson
http://jchris.mfdz.com
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
On 06/01/2009, at 1:33 PM, Chris Anderson wrote:
> You could put it in doc._db_uuid.
But it's not a document property - it qualifies the id/rev within a
value space, but it shouldn't travel with the document because it's a
function of which replica you happened to retrieve it from.
OTOH, I guess you could make it such that it is ignored by the db when
writing, and inserted when returning the document. And maybe as a
"where did I get this value from" it may have some as-yet-unforeseen
usefulness.
> CouchDB can't tell, without regenerating the view index, whether a
> given document update will effect a given key range of the view. So
> any query which needed the id and rev in order to determine cache
> freshness, would need to update the view.
Currently the view machinery supplies the id of the document for each
row. I don't understand why it can't supply the rev of the document
that was passed to map().
And the view machinery already tracks updates to design documents, so
storing the rev of the design document in the view structure shouldn't
be hard.
I must be misunderstanding something - I'm afraid your subsequent
email hasn't made it clearer to me.
> There is also not currently any facility to query for just the key of
> a view's key-value pair. I suppose we could add one but since the
> internal cost to CouchDB would be equivalent to a full row lookup
> (unless your values are very very large.)
A parallel to include_docs=false. Actually, this isn't *so* important
to me - my concern is that by having the rev reliably in the result
row with the id so that generic machinery can act on view results
independently of the map function.
> So while the id/rev => doc-state equivalences are interesting (and
> handy)
Much handier if made globally unique with the db uuid.
> , I'm still missing the obvious use case for extending them to
> view rows.
Completeness. IMO this is a simple change that has significant
conceptual weight i.e. the idea of the universe of all Couch instances
holding hierarchic sequences of uniquely identifiable immutable values.
I must admit I'm not up on the form implementation, but I suggest that
anything that maps documents should include enough system-provided
meta-information to construct the global key for the constituent values.
I know the obvious use-case is caching, and one response is to rely on
Couch's HTTP-based caching, but this change would allow not only more
extensive caching independent of Couch's API model i.e value-tracing
forms such as memoization, but IMO will also prove to have other
benefits. It's a clean theoretical idea.
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
The difference between ordinary and extraordinary is that little extra.
Re: Proposal: Extending immutability
Posted by Chris Anderson <jc...@gmail.com>.
On Mon, Jan 5, 2009 at 6:52 PM, Antony Blakey <an...@gmail.com> wrote:
>> 1. Provide the document rev whenever the id is returned, such as view
>
>> results i.e. not in the document, but in the per-row hash.
>
>
> I meant in the per-row hash *as well*. I'm not suggesting changing anything
> in the document, although the raw document get should return the db UUID,
> and I guess that would need to come as a HTTP header. Bit ugly, and not a
> great API, but as long as the document isn't wrapped, I can't see where else
> to put it.
You could put it in doc._db_uuid.
I think I'm beginning to see what you'd use this for. There are a few
implementation notes.
CouchDB can't tell, without regenerating the view index, whether a
given document update will effect a given key range of the view. So
any query which needed the id and rev in order to determine cache
freshness, would need to update the view.
There is also not currently any facility to query for just the key of
a view's key-value pair. I suppose we could add one but since the
internal cost to CouchDB would be equivalent to a full row lookup
(unless your values are very very large.)
Now - if you were using the views primarily as a translation mechanism
(to reformat documents) but want document-like caching semantics for
those reformatted documents (and lookup by-id works for you), then the
new forms feature may be exactly what you need.
So while the id/rev => doc-state equivalences are interesting (and
handy), I'm still missing the obvious use case for extending them to
view rows.
--
Chris Anderson
http://jchris.mfdz.com
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
> 1. Provide the document rev whenever the id is returned, such as view
> results i.e. not in the document, but in the per-row hash.
I meant in the per-row hash *as well*. I'm not suggesting changing
anything in the document, although the raw document get should return
the db UUID, and I guess that would need to come as a HTTP header. Bit
ugly, and not a great API, but as long as the document isn't wrapped,
I can't see where else to put it.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Human beings, who are almost unique in having the ability to learn
from the experience of others, are also remarkable for their apparent
disinclination to do so.
-- Douglas Adams
Re: Proposal: Extending immutability
Posted by Antony Blakey <an...@gmail.com>.
> 1. Provide the document rev whenever the id is returned, such as view
> results i.e. not in the document, but in the per-row hash.
I meant in the per-row hash *as well*. I'm not suggesting changing
anything in the document, although the raw document get should return
the db UUID, and I guess that would need to come as a HTTP header. Bit
ugly, and not a great API, but as long as the document isn't wrapped,
I can't see where else to put it.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Human beings, who are almost unique in having the ability to learn
from the experience of others, are also remarkable for their apparent
disinclination to do so.
-- Douglas Adams
Re: Proposal: Extending immutability
Posted by Mahesh Paolini-Subramanya <ma...@aptela.com>.
+1
---
Mahesh Paolini-Subramanya
CTO, Aptela Inc.
(703.386.1500 x9100)
http://www.aptela.com
On Jan 5, 2009, at 8:26 AM, Antony Blakey wrote:
> I've cc'd this to couchdb-user, because I think this discussion
> belongs on -dev, but everyone watches -user.
>
> One of the great features of Couch is the use of optimistic locking
> i.e. rev as a bedrock mechanism, and the way this is permeated through
> the API. The combination of id + rev is a reference to an immutable
> value (with some caveats, one subject of this proposal). This means
> that you get caching for free. By keying off id + rev, you can cache
> the document along with any (functional) derived values. Additionally
> you can trivially memoize functions of multiple documents using that
> mechanism.
>
> I use this to good effect in my application, where I aggressively
> cache the documents (which are sometimes large) and therefore don't
> need the document content in queries. To take advantage of this
> however this means that my views need to include the _rev as the
> value, and transformation that would normally happen in the map
> happens in the client.
>
> It would be very useful to have the rev returned wherever an id is
> returned, specifically in view results. You could then use a view
> without include_docs, and get the ids and revs. You can keep a cache
> (per view, pre db) of the results. The actual view results only need
> to be fetched on a cache miss, which can be driven by the cache
> machinery.
>
> The nice thing is that all of this caching machinery can be
> transparently interposed. Except when the view definition is changed.
> So I also propose to have the rev and id of the design doc returned in
> the view results. And for completeness, every database should be
> assigned a UUID when it is created. This UUID should be provided in
> the dbinfo, and for every view and view-like result.
>
> This means that from every view result you can construct a list of
> universally unique references to immutable values e.g. DB UUID + (View
> id + rev) + (Document id + rev). A form of referential transparency -
> and with a cache and a little bit of 100% generic machinery, it can be
> true referential transparency. Clients don't have to watch/be notified
> about changes to design docs, or even database creation/deletion.
> Systemwide transparent caching in particular becomes trivial.
>
> So, in summary I propose:
>
> 1. Provide the document rev whenever the id is returned, such as view
> results i.e. not in the document, but in the per-row hash.
>
> 2. Provide the design document id and rev in view results i.e. in the
> top level hash.
>
> 3. Add a UUID to databases, and provide that in view results i.e. in
> the top level hash, and all other database operation results.
>
> I think you could do this even with reduce results, but I haven't
> though a lot about it.
>
> I think this generalised the current API in a very useful way, that
> will greatly simplify, and hence 'robustify' client code. Although I
> haven't checked the implementation code, my experience so far suggest
> this isn't difficult.
>
> Antony Blakey
> -------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> Every task involves constraint,
> Solve the thing without complaint;
> There are magic links and chains
> Forged to loose our rigid brains.
> Structures, structures, though they bind,
> Strangely liberate the mind.
> -- James Fallen
>
>
Re: Proposal: Extending immutability
Posted by Robert Dionne <bo...@gmail.com>.
Interesting proposal. A question:
First, I'm a little confused by the use of the phrase referential
transparency as I understand the more technical definition in the FP
literature (call a function twice on the same input and get the same
output), but I think I see the intended meaning, please clarify if
you have another meaning.
Do you have applications and/or envision use cases that are dynamic
enough that you want to track design doc changes? It seems to me
these are more a development time concern.
On Jan 5, 2009, at 8:26 AM, Antony Blakey wrote:
> I've cc'd this to couchdb-user, because I think this discussion
> belongs on -dev, but everyone watches -user.
>
> One of the great features of Couch is the use of optimistic locking
> i.e. rev as a bedrock mechanism, and the way this is permeated
> through the API. The combination of id + rev is a reference to an
> immutable value (with some caveats, one subject of this proposal).
> This means that you get caching for free. By keying off id + rev,
> you can cache the document along with any (functional) derived
> values. Additionally you can trivially memoize functions of
> multiple documents using that mechanism.
>
> I use this to good effect in my application, where I aggressively
> cache the documents (which are sometimes large) and therefore don't
> need the document content in queries. To take advantage of this
> however this means that my views need to include the _rev as the
> value, and transformation that would normally happen in the map
> happens in the client.
>
> It would be very useful to have the rev returned wherever an id is
> returned, specifically in view results. You could then use a view
> without include_docs, and get the ids and revs. You can keep a
> cache (per view, pre db) of the results. The actual view results
> only need to be fetched on a cache miss, which can be driven by the
> cache machinery.
>
> The nice thing is that all of this caching machinery can be
> transparently interposed. Except when the view definition is
> changed. So I also propose to have the rev and id of the design doc
> returned in the view results. And for completeness, every database
> should be assigned a UUID when it is created. This UUID should be
> provided in the dbinfo, and for every view and view-like result.
>
> This means that from every view result you can construct a list of
> universally unique references to immutable values e.g. DB UUID +
> (View id + rev) + (Document id + rev). A form of referential
> transparency - and with a cache and a little bit of 100% generic
> machinery, it can be true referential transparency. Clients don't
> have to watch/be notified about changes to design docs, or even
> database creation/deletion. Systemwide transparent caching in
> particular becomes trivial.
>
> So, in summary I propose:
>
> 1. Provide the document rev whenever the id is returned, such as
> view results i.e. not in the document, but in the per-row hash.
>
> 2. Provide the design document id and rev in view results i.e. in
> the top level hash.
>
> 3. Add a UUID to databases, and provide that in view results i.e.
> in the top level hash, and all other database operation results.
>
> I think you could do this even with reduce results, but I haven't
> though a lot about it.
>
> I think this generalised the current API in a very useful way, that
> will greatly simplify, and hence 'robustify' client code. Although
> I haven't checked the implementation code, my experience so far
> suggest this isn't difficult.
>
> Antony Blakey
> -------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> Every task involves constraint,
> Solve the thing without complaint;
> There are magic links and chains
> Forged to loose our rigid brains.
> Structures, structures, though they bind,
> Strangely liberate the mind.
> -- James Fallen
>
>