You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Jan Lehnardt <ja...@apache.org> on 2009/03/07 11:55:06 UTC

Re: Best way to "migrate" (a la Rails) Couch documents

CC'ing dev@ because it is a dev issue.


On 6 Mar 2009, at 19:17, Chris Anderson wrote:

> On Fri, Mar 6, 2009 at 10:05 AM, Jason Smith <jhs@proven-corporation.com 
> > wrote:
>> Hi, list.
>>
>> While I am happy to be learning Couch for a new project, I am still  
>> unsure
>> about some tricks that I used with Django and Rails, such as data  
>> migration:
>>
>> For example, suppose I change my code and instead of using a string
>> timestamp in my documents, I would prefer a hash with "day",  
>> "month", and
>> "year" keys.  When I deploy the new code into production, obviously  
>> I want
>> the data structures to change for all existing documents.
>>
>> So my question is: What is the preferred or recommended method to  
>> do this?
>>  So far, the only thing I can think of is to write some client code  
>> to do
>> the following:
>>
>> 1. Fetch _all_docs
>> 2. For each document that requires changing, modify it
>> 3. Either PUT the new documents up one by one, or POST them to  
>> _bulk_docs,
>> depending on the situation.
>>
>> This solution doesn't strike me as particularly horrible, but I was
>> wondering if there is a better way, perhaps something server-side.
>
> This is basically the way to do it. If you want to be sure you've got
> it right, the thing to do is create a view that emits for all docs
> with the old timestamp format. Then you can process docs from that
> view, until it is empty. This way you can be sure no docs slip through
> the cracks.
>
> A migration function, written in JavaScript, and executed on the
> server, can fit the CouchDB model, it just has not been implemented
> yet. So the above is the way to proceed for the foreseeable future.

It occurred to me that the easiest way to implement this would be the
introduction of a "compaction function". Instead of sending an empty
POST request to `/db/_compact` a user sends a JSON body that
includes a compaction function and potentially options (or just
the plain JS function, doesn't matter). The compaction routine would
then launch a query server and pipe all latest documents through
the function and write out the results into the new DB.

Of course, the current behaviour stays in place and remains
the default case. The proposed method would only help with
changing large deployment situations.

One problem I see is timing issues with client-code and multiple
nodes. Client libs wouldn't know when to expect which document
structure or would have to be needlessly complex. But I think that's
a deployment issue in general and CouchDB could provide
notifications to help with that, but not generally solve that problem.

Is this worth thinking about?

Cheers
Jan
--