You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Antony Blakey <an...@gmail.com> on 2008/12/21 05:10:57 UTC

Re: couchdb (_external consistency issues and proposals)

(Posted to -dev because it has some development issues)

This is wrong BTW:

>        elsif doc["Type"] == "user"
>          doc["Roles"] && doc["Roles"].each do |r|
>            db.execute("replace into links values (?, ?, ?)",  
> db_name, doc_id, r);
>          end

because it doesn't handle modifications correctly. In my production  
code I do this:

   db.execute("delete from links where db = ? and src = ?", db_name,  
doc_id);
   doc["Roles"] && doc["Roles"].each do |r|
     db.execute("insert into links values (?, ?, ?)", db_name, doc_id,  
r);
   end

i.e. always delete and recreate the derived document. You can do  
incremental updates by reading from your indexes before updating. You  
cannot reliably get the previous rev (for differencing) because it may  
not exist.

My code also doesn't handle a database being deleted and then re- 
created - the _external will think it has valid records, but they  
belong to a previous database. You could do that through  
notifications, but once again I think it needs to be synchronous if  
you want to reason about it. A likely-to-work-most-of-the-time  
solution would be to detect update_seq < stored_update_seq. A better  
solution would be for each db to have a UUID, so that you don't have  
to rely on the name as the identity.

Also, if your _external doesn't get triggered for a long time, and  
while it's 'dormant' a document is deleted and the db is compacted,  
you could miss deletions. One solution to that is that every _external  
needs to be notified (synchronously) before a compaction so that it  
can update to the update_seq of the MVCC snapshot that the compaction  
will operate against. IMO a better solution is to have two UUID's for  
the database - one is per database, and one is 'per compaction'. Thus  
an external will know if it needs to revalidate all the documents it  
has indexed to check for missed deletions updates. You could just have  
a per-compaction UUID, which would change if a db was deleted and then  
created, this triggering the same codepath, but this is a lot more  
expensive than knowing that the entire db

Finally, note that this external operates for *every* database,  
whereas you may want to enable and configure it using a design  
document. Thus your external should always monitor updated design  
documents and check for enablement. You can record the configuration  
in the database (and cache it in the _external) and just ignore all  
other changes. Personally I don't bother because the lazy-creation  
means that no work is done unless I do an _external query, so  
databases which don't get queried, don't incur a cost, and I have no  
configuration data.

That's another reason to prefer a passive UUID-based identity scheme  
for db-create/delete and compaction detection rather than a  
notification system.

It would be good if each DB had two UUIDs, one per-db and one per- 
compaction i.e. changed in the MVCC snapshot during a compaction, and  
that these be provided to every _external request.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

If at first you don’t succeed, try, try again. Then quit. No use being  
a damn fool about it
   -- W.C. Fields


Re: couchdb (_external consistency issues and proposals)

Posted by Antony Blakey <an...@gmail.com>.
On 22/12/2008, at 5:14 PM, Antony Blakey wrote:

> I now know that this is wrong, sorry. Document deletions are never  
> 'lost', and hence there's no need to track compaction generations.  
> That raises a very different issue I noted in 'History of deletion,  
> and the interaction with compactions' on couchdb-dev, but it's  
> nothing to do with _external.

Hmmm. Further digging reveals that the purge function will in fact  
remove the record of deletions. Luckily there's a purge_seq value  
supplied in the dbinfo result (and also in each _external call). By  
tracking this, an _external knows when to revalidate it's documents.

Purging can break replication, especially in distributed systems  
without centralized knowledge or control of replication status (which  
is my situation).

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Lack of will power has caused more failure than lack of intelligence  
or ability.
  -- Flower A. Newhouse


Re: couchdb (_external consistency issues and proposals)

Posted by Antony Blakey <an...@gmail.com>.
On 22/12/2008, at 5:14 PM, Antony Blakey wrote:

> I now know that this is wrong, sorry. Document deletions are never  
> 'lost', and hence there's no need to track compaction generations.  
> That raises a very different issue I noted in 'History of deletion,  
> and the interaction with compactions' on couchdb-dev, but it's  
> nothing to do with _external.

Hmmm. Further digging reveals that the purge function will in fact  
remove the record of deletions. Luckily there's a purge_seq value  
supplied in the dbinfo result (and also in each _external call). By  
tracking this, an _external knows when to revalidate it's documents.

Purging can break replication, especially in distributed systems  
without centralized knowledge or control of replication status (which  
is my situation).

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Lack of will power has caused more failure than lack of intelligence  
or ability.
  -- Flower A. Newhouse


Re: couchdb (_external consistency issues and proposals)

Posted by Antony Blakey <an...@gmail.com>.
On 21/12/2008, at 2:40 PM, Antony Blakey wrote:

> Also, if your _external doesn't get triggered for a long time, and  
> while it's 'dormant' a document is deleted and the db is compacted,  
> you could miss deletions. One solution to that is that every  
> _external needs to be notified (synchronously) before a compaction  
> so that it can update to the update_seq of the MVCC snapshot that  
> the compaction will operate against. IMO a better solution is to  
> have two UUID's for the database - one is per database, and one is  
> 'per compaction'. Thus an external will know if it needs to  
> revalidate all the documents it has indexed to check for missed  
> deletions updates. You could just have a per-compaction UUID, which  
> would change if a db was deleted and then created, this triggering  
> the same codepath, but this is a lot more expensive than knowing  
> that the entire db

I now know that this is wrong, sorry. Document deletions are never  
'lost', and hence there's no need to track compaction generations.  
That raises a very different issue I noted in 'History of deletion,  
and the interaction with compactions' on couchdb-dev, but it's nothing  
to do with _external.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Reflecting on W.H. Auden's contemplation of 'necessary murders' in the  
Spanish Civil War, George Orwell wrote that such amorality was only  
really possible, 'if you are the kind of person who is always  
somewhere else when the trigger is pulled'.
   -- John Birmingham, "Appeasing Jakarta"



Re: couchdb (_external consistency issues and proposals)

Posted by Antony Blakey <an...@gmail.com>.
On 21/12/2008, at 2:40 PM, Antony Blakey wrote:

> Also, if your _external doesn't get triggered for a long time, and  
> while it's 'dormant' a document is deleted and the db is compacted,  
> you could miss deletions. One solution to that is that every  
> _external needs to be notified (synchronously) before a compaction  
> so that it can update to the update_seq of the MVCC snapshot that  
> the compaction will operate against. IMO a better solution is to  
> have two UUID's for the database - one is per database, and one is  
> 'per compaction'. Thus an external will know if it needs to  
> revalidate all the documents it has indexed to check for missed  
> deletions updates. You could just have a per-compaction UUID, which  
> would change if a db was deleted and then created, this triggering  
> the same codepath, but this is a lot more expensive than knowing  
> that the entire db

I now know that this is wrong, sorry. Document deletions are never  
'lost', and hence there's no need to track compaction generations.  
That raises a very different issue I noted in 'History of deletion,  
and the interaction with compactions' on couchdb-dev, but it's nothing  
to do with _external.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Reflecting on W.H. Auden's contemplation of 'necessary murders' in the  
Spanish Civil War, George Orwell wrote that such amorality was only  
really possible, 'if you are the kind of person who is always  
somewhere else when the trigger is pulled'.
   -- John Birmingham, "Appeasing Jakarta"