You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by John Merrells <jo...@merrells.com> on 2010/04/11 21:21:57 UTC

purge old revisions of documents

Hello,

What's the best approach for periodically purging old documents?
Particularly old revisions of documents. 

Ideally I'd just define a policy, like 'delete any revisions older than 
30 days', and some background reaper process would take care 
of it. I'm fine implementing some code to do this myself, but maybe
there is a better way :-)

John

-- 
John Merrells
http://johnmerrells.com
+1.415.244.5808







Re: purge old revisions of documents

Posted by Sebastian Cohnen <se...@googlemail.com>.
Hey John,

On 12.04.2010, at 05:43, John Merrells wrote:

> 
> Thanks. I'd assumed that document revisions were kept forever (ala git)
> and that compaction only changed the organization of the database,
> rather than the data itself.

the point is, that the _rev-token is not *really* for versioning. it is only used for concurrency control. that being said you can think of compaction as some kind of garbage collection (git gc), not only reorganizing but also purging not needed stuff.

Re: purge old revisions of documents

Posted by Randall Leeds <ra...@gmail.com>.
On Mon, Apr 12, 2010 at 05:58, John Merrells <jo...@merrells.com> wrote:
>
> Anyway,  you might consider allowing the client to specify its constraint
> requirements, rather than just enforcing the strictest constraints upon it.
> Not all clients care which revision of a document their updates are applied
> to. I've already run into this myself this week, as multiple machines were
> all trying to update the same document on a single server with identical
> content and they got into a race...

Open a JIRA ticket as an enhancement or bring it up on dev@ if you
want to start a little discussion about it. I very much like the
default behavior now. I think it's very 'safe' in that it forces the
developer to think about conflict scenarios as they design their app.
However, something like an "X-Couch-Force-Update: true" header doesn't
seem unreasonable to me. It satisfies your desires without
compromising the default. I know at least one person I've talked to
has requested this feature specifically to make DELETE easier.

Re: purge old revisions of documents

Posted by John Merrells <jo...@merrells.com>.
On Apr 11, 2010, at 9:30 PM, Randall Leeds wrote:

> On Mon, Apr 12, 2010 at 04:43, John Merrells <jo...@merrells.com> wrote:
>> I now realize that the document revisions are really just a side effect
>> of the implementation of the async multi master replication... as you
>> need them to do the collision detection.
> 
> Not quite. It's actually for collision detection on document updates,
> or MultiVersion Concurrency Control. CouchDB exposes the revision to
> the client because saving a document without specifying the latest
> revision is a conflict: the client must specify the revision she
> intends to update. Couch will not allow a document update if it has
> changed since the client last read it.


Mmm... ok, but they're much the same thing really, as both features
depend on much the same underlying state information.

For LDAP, in the mult master case, we never provided the client with 
any consistency guarantees for the database, so an update was always
against whatever state the server had at the moment the update was
accepted. This worked fine for the kind of data that people were putting
into LDAP directories at the time, but then, sigh, people started trying 
to put shopping catalog information into it, and network configuration
information, etc, etc, ... what they really needed was a server with fewer
information model constraints that they could store their blobs and 
document like things in... sounds familiar huh...

Anyway,  you might consider allowing the client to specify its constraint 
requirements, rather than just enforcing the strictest constraints upon it.
Not all clients care which revision of a document their updates are applied 
to. I've already run into this myself this week, as multiple machines were
all trying to update the same document on a single server with identical
content and they got into a race... 

John

-- 
John Merrells
http://johnmerrells.com
+1.415.244.5808







Re: purge old revisions of documents

Posted by Randall Leeds <ra...@gmail.com>.
On Mon, Apr 12, 2010 at 04:43, John Merrells <jo...@merrells.com> wrote:
> I now realize that the document revisions are really just a side effect
> of the implementation of the async multi master replication... as you
> need them to do the collision detection.

Not quite. It's actually for collision detection on document updates,
or MultiVersion Concurrency Control. CouchDB exposes the revision to
the client because saving a document without specifying the latest
revision is a conflict: the client must specify the revision she
intends to update. Couch will not allow a document update if it has
changed since the client last read it.

Re: purge old revisions of documents

Posted by John Merrells <jo...@merrells.com>.
Thanks. I'd assumed that document revisions were kept forever (ala git)
and that compaction only changed the organization of the database,
rather than the data itself.

I think the terminology threw me a bit there, and of course I brought my
own assumptions with me, and haven't read enough of the docs yet.

I now realize that the document revisions are really just a side effect 
of the implementation of the async multi master replication... as you 
need them to do the collision detection. 

But, then I wonder why, since they can't be relied upon to exist, they 
are exposed to the regular clients, and not just the clients that want 
to replicate or synchronize. Hmm, just thinking out loud there. (fwiw
i did a lot of work on ldap directories and we were careful to only 
expose the replication state to clients who explicitly asked for it.)

John

-- 
John Merrells
http://johnmerrells.com
+1.415.244.5808







Re: purge old revisions of documents

Posted by Randall Leeds <ra...@gmail.com>.
On Mon, Apr 12, 2010 at 00:54, Ben Hall <be...@googlemail.com> wrote:
> But what if you wanted to keep some old versions? Doesn't compacting
> remove all ?

This question comes up a lot. If you'd like to preserve old revisions
of documents you should save them explicitly. Saving them as their own
documents or as attachments to the newest revision are two typical
patterns for this behavior.

You should never rely on couch to preserve your old document revisions.

-Randall

Re: purge old revisions of documents

Posted by Ben Hall <be...@googlemail.com>.
But what if you wanted to keep some old versions? Doesn't compacting
remove all ?

On Sun, Apr 11, 2010 at 8:42 PM, Michael Ludwig <mi...@gmx.de> wrote:
> John Merrells schrieb am 11.04.2010 um 12:21:57 (-0700):
>
>> What's the best approach for periodically purging old documents?
>> Particularly old revisions of documents.
>
> Looks like the easiest way is compacting the database, in doing which
> you'll lose old revisions.
>
> --
> Michael Ludwig
>

Re: purge old revisions of documents

Posted by Michael Ludwig <mi...@gmx.de>.
John Merrells schrieb am 11.04.2010 um 12:21:57 (-0700):

> What's the best approach for periodically purging old documents?
> Particularly old revisions of documents. 

Looks like the easiest way is compacting the database, in doing which
you'll lose old revisions.

-- 
Michael Ludwig