You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Filipe David Manana <fd...@gmail.com> on 2010/02/01 19:58:29 UTC

associating UUIDs to DBs

Hi everybody,

Recently there was a suggestion at #couchdb, involving me, Chris Anderson
and Adam Kocoloski, about the possibility of adding UUIDs to a DB.

It appeared in the context of a future _replications DB (which would store
history about replication sessions, etc). It would not be desirable to
replicate this DB into other nodes, otherwise it would mess up with their
replication session history.

It might not be desirable to accidentally replicate other system related DBs
or user DBs (for some app specific reason).
Then Chris suggested the possibility of adding a UUID to the DB. This UUID
would be listed when doing a GET /somedb.

At that moment I had to run and had no time to pose questions about this
feature.
I would like to understand it, have 1 or 2 example use cases, and figure out
how it would impact the current code base, and code it.

What I am thinking about is that a source DB which has a UUID associated
with it, can not be replicated into a target DB unless the replication
objects specifies the source's UUID.

Example, for a DB testdb with UUID=qwerty

POST /_replicate/
{ "source": "testdb", "target": "testdb_copy"}'

Would fail, while doing:

POST /_replicate/
{ "source":  "testdb",  "source_uuid":  "qwerty",  "target":  "testdb_copy"
}

To create a DB with a UUID, we would just use a query parameter to specifiy
it, or use a boolean parameter to let couch generate it, for example. If
none of these query parameters is given, the UUID is simply not generated.

Now, I might have not understood completely what Chris and Adam have in
mind, that's why I would like to collect some feedback from any of you.
I think I'm missing something from the big picture.

best regards,

-- 
Filipe David Manana,
fdmanana@gmail.com
PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Re: associating UUIDs to DBs

Posted by Randall Leeds <ra...@gmail.com>.
I've been thinking about UUIDs for DBs myself. I see a couple cases
where this could come in handy. Both involve replication:

1)
IIRC, pull replication with an "http://..." (but local!) target and a
plain "dbname" target do not share checkpoints. Using a UUID would
allow CouchDB to easily identify whether it is pulling to a local
database even if the target specifies a URI and even if hostname
report's something other than the domain of the URI.

2)
If replication checkpoints store the UUID of the database the changes
were read from it would allow CouchDB to replicate checkpoints. For
example, imagine a continuous bi-directional replication between nodes
A and B. Now, add node C to the group and replicate from B to C. If we
then set up replication between A and C, C would not have to scan all
of A's changes because it is already aware of the checkpoints B made
when replicated from A by comparing its checkpoint source UUIDs to A's
UUID.

I can't claim to understand the internals of the replication code very
well (yet...), but I believe my claims to be correct and welcome
anyone who knows better to step in and say otherwise. Assuming I got
my facts straight, I'd love to see this for 0.11, as well as a
replications db for continuous replications that survive restarts. Oh,
and replication based on _changes.

-Randall

P.S. Will work for alcohol.

On Mon, Feb 1, 2010 at 12:38, Robert Newson <ro...@gmail.com> wrote:
> can't you make a _local document with your own unique identifier?
> _local documents are not replicaed.
>
> B.
>
> On Mon, Feb 1, 2010 at 6:58 PM, Filipe David Manana <fd...@gmail.com> wrote:
>> Hi everybody,
>>
>> Recently there was a suggestion at #couchdb, involving me, Chris Anderson
>> and Adam Kocoloski, about the possibility of adding UUIDs to a DB.
>>
>> It appeared in the context of a future _replications DB (which would store
>> history about replication sessions, etc). It would not be desirable to
>> replicate this DB into other nodes, otherwise it would mess up with their
>> replication session history.
>>
>> It might not be desirable to accidentally replicate other system related DBs
>> or user DBs (for some app specific reason).
>> Then Chris suggested the possibility of adding a UUID to the DB. This UUID
>> would be listed when doing a GET /somedb.
>>
>> At that moment I had to run and had no time to pose questions about this
>> feature.
>> I would like to understand it, have 1 or 2 example use cases, and figure out
>> how it would impact the current code base, and code it.
>>
>> What I am thinking about is that a source DB which has a UUID associated
>> with it, can not be replicated into a target DB unless the replication
>> objects specifies the source's UUID.
>>
>> Example, for a DB testdb with UUID=qwerty
>>
>> POST /_replicate/
>> { "source": "testdb", "target": "testdb_copy"}'
>>
>> Would fail, while doing:
>>
>> POST /_replicate/
>> { "source":  "testdb",  "source_uuid":  "qwerty",  "target":  "testdb_copy"
>> }
>>
>> To create a DB with a UUID, we would just use a query parameter to specifiy
>> it, or use a boolean parameter to let couch generate it, for example. If
>> none of these query parameters is given, the UUID is simply not generated.
>>
>> Now, I might have not understood completely what Chris and Adam have in
>> mind, that's why I would like to collect some feedback from any of you.
>> I think I'm missing something from the big picture.
>>
>> best regards,
>>
>> --
>> Filipe David Manana,
>> fdmanana@gmail.com
>> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>>
>> "Reasonable men adapt themselves to the world.
>> Unreasonable men adapt the world to themselves.
>> That's why all progress depends on unreasonable men."
>>
>

Re: associating UUIDs to DBs

Posted by Robert Newson <ro...@gmail.com>.
can't you make a _local document with your own unique identifier?
_local documents are not replicaed.

B.

On Mon, Feb 1, 2010 at 6:58 PM, Filipe David Manana <fd...@gmail.com> wrote:
> Hi everybody,
>
> Recently there was a suggestion at #couchdb, involving me, Chris Anderson
> and Adam Kocoloski, about the possibility of adding UUIDs to a DB.
>
> It appeared in the context of a future _replications DB (which would store
> history about replication sessions, etc). It would not be desirable to
> replicate this DB into other nodes, otherwise it would mess up with their
> replication session history.
>
> It might not be desirable to accidentally replicate other system related DBs
> or user DBs (for some app specific reason).
> Then Chris suggested the possibility of adding a UUID to the DB. This UUID
> would be listed when doing a GET /somedb.
>
> At that moment I had to run and had no time to pose questions about this
> feature.
> I would like to understand it, have 1 or 2 example use cases, and figure out
> how it would impact the current code base, and code it.
>
> What I am thinking about is that a source DB which has a UUID associated
> with it, can not be replicated into a target DB unless the replication
> objects specifies the source's UUID.
>
> Example, for a DB testdb with UUID=qwerty
>
> POST /_replicate/
> { "source": "testdb", "target": "testdb_copy"}'
>
> Would fail, while doing:
>
> POST /_replicate/
> { "source":  "testdb",  "source_uuid":  "qwerty",  "target":  "testdb_copy"
> }
>
> To create a DB with a UUID, we would just use a query parameter to specifiy
> it, or use a boolean parameter to let couch generate it, for example. If
> none of these query parameters is given, the UUID is simply not generated.
>
> Now, I might have not understood completely what Chris and Adam have in
> mind, that's why I would like to collect some feedback from any of you.
> I think I'm missing something from the big picture.
>
> best regards,
>
> --
> Filipe David Manana,
> fdmanana@gmail.com
> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>

Re: associating UUIDs to DBs

Posted by Randall Leeds <ra...@gmail.com>.
My mind immediately turns to git histories... but do we want to go there?

Re: associating UUIDs to DBs

Posted by Adam Kocoloski <ko...@apache.org>.
On Feb 4, 2010, at 5:05 PM, Randall Leeds wrote:

> On Thu, Feb 4, 2010 at 08:17, Adam Kocoloski <ko...@apache.org> wrote:
>> 
>> If we went ahead and implemented this I think the UUID becomes superfluous from the replicator's perspective.  You wouldn't want to restrict this Merkle tree check to UUID-matched DBs, as it would be useful for reducing entropy in a sharded database cluster that stores multiple copies of each document in different database shards.  In fact, IIRC that was a Dynamo feature in the original Amazon paper.
> 
> I mostly follow and I think I agree.
> Can you clarify "as it would be useful for reducing entropy..."?
> 
> Randall

Sure, that was too terse on my part.  I'm referring to the case where you're promising to write N copies of a document in your cluster, but for whatever reason you only succeed W<N times.  Hence "entropy" -- the N shards start diverging from one another after transient failures.

You want those missing writes to eventually propagate to the N-W shards that didn't get them.  CouchDB's _changes replication works for this purpose, but it's relatively resource-intensive because it checks for the existence of every update on the target.  I suspect that comparing Merkle trees may be a more efficient way to figure out what to replicate in this special case where the two DBs are always supposed to be identical.  Cheers,

Adam


Re: associating UUIDs to DBs

Posted by Randall Leeds <ra...@gmail.com>.
On Thu, Feb 4, 2010 at 08:17, Adam Kocoloski <ko...@apache.org> wrote:
>
> If we went ahead and implemented this I think the UUID becomes superfluous from the replicator's perspective.  You wouldn't want to restrict this Merkle tree check to UUID-matched DBs, as it would be useful for reducing entropy in a sharded database cluster that stores multiple copies of each document in different database shards.  In fact, IIRC that was a Dynamo feature in the original Amazon paper.

I mostly follow and I think I agree.
Can you clarify "as it would be useful for reducing entropy..."?

Randall

Re: associating UUIDs to DBs

Posted by Adam Kocoloski <ko...@apache.org>.
On Feb 4, 2010, at 10:44 AM, Paul Davis wrote:

> On Thu, Feb 4, 2010 at 10:19 AM, Adam Kocoloski <ko...@apache.org> wrote:
>> On Feb 3, 2010, at 4:53 AM, Brian Candler wrote:
>> 
>>> On Tue, Feb 02, 2010 at 09:41:28PM +0000, Robert Newson wrote:
>>>> If couchdb tracked replication by a Merkle tree, it would obsolete the
>>>> update_seq mechanism?
>>> 
>>> Only if you weren't doing filtered/selective replication. And probably only
>>> if there was nothing else different between the two databases (e.g. _local
>>> docs, _design docs, reader acls etc)
>> 
>> Correct, Merkle trees are only useful if you expect the two databases to be completely identical.  But Bob's right, I'm essentially proposing that our by_seq btree is extended into a full Merkle tree for this particular use-case.
>> 
>> Adam
> 
> Most intriguing. Could you expand on that a bit?
> 
> Paul

Hi Paul,

The more I think about it using by_seq may not be the optimal choice here.  Consider the case where I snapshot my .couch file over to a new server, and in the meantime I update the document that was occupying update_seq 1 on the original.  The analysis I proposed above would conclude that the replication needs to start from the beginning, which is true, but overlooks the fact that only one document has changed.

An alternative would be to do the Merkle stuff in the by_id tree, and instead of identifying the last update_seq where two DBs are identical, identify the set of documents that differ between the two DBs.  Replicate just those documents using Filipe's new patch, then record a checkpoint at the source's latest update_seq.  You're now fully caught up in case you're planning any future _changes-based incremental replications.

If we went ahead and implemented this I think the UUID becomes superfluous from the replicator's perspective.  You wouldn't want to restrict this Merkle tree check to UUID-matched DBs, as it would be useful for reducing entropy in a sharded database cluster that stores multiple copies of each document in different database shards.  In fact, IIRC that was a Dynamo feature in the original Amazon paper.

Adam





Re: associating UUIDs to DBs

Posted by Paul Davis <pa...@gmail.com>.
On Thu, Feb 4, 2010 at 10:19 AM, Adam Kocoloski <ko...@apache.org> wrote:
> On Feb 3, 2010, at 4:53 AM, Brian Candler wrote:
>
>> On Tue, Feb 02, 2010 at 09:41:28PM +0000, Robert Newson wrote:
>>> If couchdb tracked replication by a Merkle tree, it would obsolete the
>>> update_seq mechanism?
>>
>> Only if you weren't doing filtered/selective replication. And probably only
>> if there was nothing else different between the two databases (e.g. _local
>> docs, _design docs, reader acls etc)
>
> Correct, Merkle trees are only useful if you expect the two databases to be completely identical.  But Bob's right, I'm essentially proposing that our by_seq btree is extended into a full Merkle tree for this particular use-case.
>
> Adam

Most intriguing. Could you expand on that a bit?

Paul

Re: associating UUIDs to DBs

Posted by Adam Kocoloski <ko...@apache.org>.
On Feb 3, 2010, at 4:53 AM, Brian Candler wrote:

> On Tue, Feb 02, 2010 at 09:41:28PM +0000, Robert Newson wrote:
>> If couchdb tracked replication by a Merkle tree, it would obsolete the
>> update_seq mechanism?
> 
> Only if you weren't doing filtered/selective replication. And probably only
> if there was nothing else different between the two databases (e.g. _local
> docs, _design docs, reader acls etc)

Correct, Merkle trees are only useful if you expect the two databases to be completely identical.  But Bob's right, I'm essentially proposing that our by_seq btree is extended into a full Merkle tree for this particular use-case.

Adam

Re: associating UUIDs to DBs

Posted by Brian Candler <B....@pobox.com>.
On Tue, Feb 02, 2010 at 09:41:28PM +0000, Robert Newson wrote:
> If couchdb tracked replication by a Merkle tree, it would obsolete the
> update_seq mechanism?

Only if you weren't doing filtered/selective replication. And probably only
if there was nothing else different between the two databases (e.g. _local
docs, _design docs, reader acls etc)

Re: associating UUIDs to DBs

Posted by Robert Newson <ro...@gmail.com>.
Seems to be what Merkle trees are for, which would allow for the kinds
of fast-forwarding this thread appears to be discussing. I think
that's essentially (or exactly) what git does, fwiw.

If couchdb tracked replication by a Merkle tree, it would obsolete the
update_seq mechanism?

B.

On Tue, Feb 2, 2010 at 8:17 PM, Adam Kocoloski <ko...@apache.org> wrote:
> On Feb 2, 2010, at 2:48 PM, Randall Leeds wrote:
>
>> On Tue, Feb 2, 2010 at 11:39, Chris Anderson <jc...@apache.org> wrote:
>>> On Tue, Feb 2, 2010 at 11:25 AM, Randall Leeds <ra...@gmail.com> wrote:
>>>> I'm not entirely happy with this patch and I'd like some help figuring
>>>> out what to do about it.
>>>>
>>>> I foresee problems when database files are copied or backed up on
>>>> disk. It's possible to end up with two couchdb instances hosting
>>>> databases with the same uuid. The problem is that the uuid is no
>>>> longer meaningful, as it doesn't do what it was intended to (uniquely
>>>> identify the database).
>>>>
>>>> Can anyone see a way around this?
>>>>
>>>
>>> I think we don't mind this. As I mentioned above, when we see that 2
>>> db files have the same uuid we can do a fast-forward replication by
>>> starting from the lower of the 2 dbs sequence #s for replication.
>>> (maybe... Adam, does this sound sane?)
>>
>> If changes had been made to both dbs separately then the lower
>> sequence # might be beyond the sequence number at which the histories
>> diverged and the changes to the "younger" db would be lost.
>
> Yes, that's the problem we'll need to solve if we're going to use UUIDs to fast-forward replication.  Off the top of my head, one way to do that would be store a DB revid calculated in the same way as the document revids (and seed it with the UUID at the beginning).  Then if you find an update_seq where the revision IDs match, you can start the replication from that point.
>
> There may be cheaper ways, though.
>
> Adam

Re: associating UUIDs to DBs

Posted by Adam Kocoloski <ko...@apache.org>.
On Feb 2, 2010, at 2:48 PM, Randall Leeds wrote:

> On Tue, Feb 2, 2010 at 11:39, Chris Anderson <jc...@apache.org> wrote:
>> On Tue, Feb 2, 2010 at 11:25 AM, Randall Leeds <ra...@gmail.com> wrote:
>>> I'm not entirely happy with this patch and I'd like some help figuring
>>> out what to do about it.
>>> 
>>> I foresee problems when database files are copied or backed up on
>>> disk. It's possible to end up with two couchdb instances hosting
>>> databases with the same uuid. The problem is that the uuid is no
>>> longer meaningful, as it doesn't do what it was intended to (uniquely
>>> identify the database).
>>> 
>>> Can anyone see a way around this?
>>> 
>> 
>> I think we don't mind this. As I mentioned above, when we see that 2
>> db files have the same uuid we can do a fast-forward replication by
>> starting from the lower of the 2 dbs sequence #s for replication.
>> (maybe... Adam, does this sound sane?)
> 
> If changes had been made to both dbs separately then the lower
> sequence # might be beyond the sequence number at which the histories
> diverged and the changes to the "younger" db would be lost.

Yes, that's the problem we'll need to solve if we're going to use UUIDs to fast-forward replication.  Off the top of my head, one way to do that would be store a DB revid calculated in the same way as the document revids (and seed it with the UUID at the beginning).  Then if you find an update_seq where the revision IDs match, you can start the replication from that point.

There may be cheaper ways, though.

Adam

Re: associating UUIDs to DBs

Posted by Randall Leeds <ra...@gmail.com>.
On Tue, Feb 2, 2010 at 11:39, Chris Anderson <jc...@apache.org> wrote:
> On Tue, Feb 2, 2010 at 11:25 AM, Randall Leeds <ra...@gmail.com> wrote:
>> I'm not entirely happy with this patch and I'd like some help figuring
>> out what to do about it.
>>
>> I foresee problems when database files are copied or backed up on
>> disk. It's possible to end up with two couchdb instances hosting
>> databases with the same uuid. The problem is that the uuid is no
>> longer meaningful, as it doesn't do what it was intended to (uniquely
>> identify the database).
>>
>> Can anyone see a way around this?
>>
>
> I think we don't mind this. As I mentioned above, when we see that 2
> db files have the same uuid we can do a fast-forward replication by
> starting from the lower of the 2 dbs sequence #s for replication.
> (maybe... Adam, does this sound sane?)

If changes had been made to both dbs separately then the lower
sequence # might be beyond the sequence number at which the histories
diverged and the changes to the "younger" db would be lost.

Re: associating UUIDs to DBs

Posted by Chris Anderson <jc...@apache.org>.
On Tue, Feb 2, 2010 at 11:25 AM, Randall Leeds <ra...@gmail.com> wrote:
> I'm not entirely happy with this patch and I'd like some help figuring
> out what to do about it.
>
> I foresee problems when database files are copied or backed up on
> disk. It's possible to end up with two couchdb instances hosting
> databases with the same uuid. The problem is that the uuid is no
> longer meaningful, as it doesn't do what it was intended to (uniquely
> identify the database).
>
> Can anyone see a way around this?
>

I think we don't mind this. As I mentioned above, when we see that 2
db files have the same uuid we can do a fast-forward replication by
starting from the lower of the 2 dbs sequence #s for replication.
(maybe... Adam, does this sound sane?)

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: associating UUIDs to DBs

Posted by Randall Leeds <ra...@gmail.com>.
I'm not entirely happy with this patch and I'd like some help figuring
out what to do about it.

I foresee problems when database files are copied or backed up on
disk. It's possible to end up with two couchdb instances hosting
databases with the same uuid. The problem is that the uuid is no
longer meaningful, as it doesn't do what it was intended to (uniquely
identify the database).

Can anyone see a way around this?

Re: associating UUIDs to DBs

Posted by Randall Leeds <ra...@gmail.com>.
As requested by Chris, and what I knew I should have just done to
begin with (doh!):
https://issues.apache.org/jira/browse/COUCHDB-477

On Mon, Feb 1, 2010 at 23:55, Randall Leeds <ra...@gmail.com> wrote:
> Here's a patch.
>
> On Mon, Feb 1, 2010 at 22:02, Nicholas Orr <ni...@zxgen.net> wrote:
>> On Tue, Feb 2, 2010 at 4:16 PM, Chris Anderson <jc...@apache.org> wrote:
>>
>>> UUIDs will be useful for a lot of things. My favorite bug we see now
>>> from not having a uuid, is when you are prototyping an app in the
>>> browser (with a function to generate X random docs for testing):
>>>
>>> When you apply the same number of randomized edits to a db, that has
>>> the same view code each time, the browser will occasionally present
>>> data from previous runs.
>
> I added a test to etags_view.js that exercises this, both with a
> design document view and _all_docs.
>
>>>
>>> This is because the Etag function does not have a database UUID for
>>> input (only the sequence number, and the ddoc signature) so after n
>>> edits with the same code you have the same Etag. The easiest fix for
>>> this is to generate a random uuid on database creation, and factor it
>>> into the Etag function.
>
> I stored the uuid in the database header. Doing this made the patch
> really simple since couch will upgrade the database format without any
> additional code and provide the uuid in the /db_name response object.
>
> I'd like to take a crack at making replication take advantage of this
> information, but for now it could still be useful for clients or
> frameworks to interact with the uuid so I'm submitting just this for
> your review.
>
> Comments welcome, but it seems pretty straightforward. I have this
> nagging feeling there are people who might be uneasy with the idea of
> putting more things in the database header, but this was by far the
> cleanest way I could see to do this.
>
> Randall
>

Re: associating UUIDs to DBs

Posted by Randall Leeds <ra...@gmail.com>.
Here's a patch.

On Mon, Feb 1, 2010 at 22:02, Nicholas Orr <ni...@zxgen.net> wrote:
> On Tue, Feb 2, 2010 at 4:16 PM, Chris Anderson <jc...@apache.org> wrote:
>
>> UUIDs will be useful for a lot of things. My favorite bug we see now
>> from not having a uuid, is when you are prototyping an app in the
>> browser (with a function to generate X random docs for testing):
>>
>> When you apply the same number of randomized edits to a db, that has
>> the same view code each time, the browser will occasionally present
>> data from previous runs.

I added a test to etags_view.js that exercises this, both with a
design document view and _all_docs.

>>
>> This is because the Etag function does not have a database UUID for
>> input (only the sequence number, and the ddoc signature) so after n
>> edits with the same code you have the same Etag. The easiest fix for
>> this is to generate a random uuid on database creation, and factor it
>> into the Etag function.

I stored the uuid in the database header. Doing this made the patch
really simple since couch will upgrade the database format without any
additional code and provide the uuid in the /db_name response object.

I'd like to take a crack at making replication take advantage of this
information, but for now it could still be useful for clients or
frameworks to interact with the uuid so I'm submitting just this for
your review.

Comments welcome, but it seems pretty straightforward. I have this
nagging feeling there are people who might be uneasy with the idea of
putting more things in the database header, but this was by far the
cleanest way I could see to do this.

Randall

Re: associating UUIDs to DBs

Posted by Nicholas Orr <ni...@zxgen.net>.
On Tue, Feb 2, 2010 at 4:16 PM, Chris Anderson <jc...@apache.org> wrote:

> UUIDs will be useful for a lot of things. My favorite bug we see now
> from not having a uuid, is when you are prototyping an app in the
> browser (with a function to generate X random docs for testing):
>
> When you apply the same number of randomized edits to a db, that has
> the same view code each time, the browser will occasionally present
> data from previous runs.
>
> This is because the Etag function does not have a database UUID for
> input (only the sequence number, and the ddoc signature) so after n
> edits with the same code you have the same Etag. The easiest fix for
> this is to generate a random uuid on database creation, and factor it
> into the Etag function.
>

I have run into this and it was most confusing. I'd destroy create the
database, add the same ddocs and create some new docs and I'd see the doc
_id from before in the browser. very annoying - then I figured out the
browser was caching stuff, so now I use ctrl+f5 in firefox to actually
refresh the data on the screen...

Re: associating UUIDs to DBs

Posted by Chris Anderson <jc...@apache.org>.
On Mon, Feb 1, 2010 at 10:58 AM, Filipe David Manana <fd...@gmail.com> wrote:
> Hi everybody,
>
> Recently there was a suggestion at #couchdb, involving me, Chris Anderson
> and Adam Kocoloski, about the possibility of adding UUIDs to a DB.
>
> It appeared in the context of a future _replications DB (which would store
> history about replication sessions, etc). It would not be desirable to
> replicate this DB into other nodes, otherwise it would mess up with their
> replication session history.
>
> It might not be desirable to accidentally replicate other system related DBs
> or user DBs (for some app specific reason).
> Then Chris suggested the possibility of adding a UUID to the DB. This UUID
> would be listed when doing a GET /somedb.
>
> At that moment I had to run and had no time to pose questions about this
> feature.
> I would like to understand it, have 1 or 2 example use cases, and figure out
> how it would impact the current code base, and code it.
>

UUIDs will be useful for a lot of things. My favorite bug we see now
from not having a uuid, is when you are prototyping an app in the
browser (with a function to generate X random docs for testing):

When you apply the same number of randomized edits to a db, that has
the same view code each time, the browser will occasionally present
data from previous runs.

This is because the Etag function does not have a database UUID for
input (only the sequence number, and the ddoc signature) so after n
edits with the same code you have the same Etag. The easiest fix for
this is to generate a random uuid on database creation, and factor it
into the Etag function.

The replicator issue is how to detect whether 2 databases have a
replication history (because hostnames can change and people can rsync
files, it makes more sense to keep a uuid around).

It occurs to me that we could use the case of a matching uuid to
trigger fast-forward replication when block-level file duplicates come
online in a cluster.

Chris



> What I am thinking about is that a source DB which has a UUID associated
> with it, can not be replicated into a target DB unless the replication
> objects specifies the source's UUID.
>
> Example, for a DB testdb with UUID=qwerty
>
> POST /_replicate/
> { "source": "testdb", "target": "testdb_copy"}'
>
> Would fail, while doing:
>
> POST /_replicate/
> { "source":  "testdb",  "source_uuid":  "qwerty",  "target":  "testdb_copy"
> }
>
> To create a DB with a UUID, we would just use a query parameter to specifiy
> it, or use a boolean parameter to let couch generate it, for example. If
> none of these query parameters is given, the UUID is simply not generated.
>
> Now, I might have not understood completely what Chris and Adam have in
> mind, that's why I would like to collect some feedback from any of you.
> I think I'm missing something from the big picture.
>
> best regards,
>
> --
> Filipe David Manana,
> fdmanana@gmail.com
> PGP key - http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC569452B
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io