You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Damien Katz <da...@apache.org> on 2009/03/10 23:44:34 UTC

rep_security merge to trunk

I think the rep_security branch is looking pretty solid. I still have  
work to do to merge with Adam's recent replicator changes.

This patch breaks the file format and replication API, so replication  
with earlier versions is not possible. And the "all or nothing w/  
conflict checking" transactions are gone. Which I think is good,  
because people were relying on it without understanding the rest of  
CouchDB doesn't support that feature.

I'd like to go ahead and merge this to trunk. Comments, suggestions  
and objections please.

-Damien

Re: rep_security merge to trunk

Posted by Jan Lehnardt <ja...@apache.org>.
On 15 Mar 2009, at 04:35, Chris Anderson wrote:

> On Wed, Mar 11, 2009 at 4:51 PM, Damien Katz <da...@apache.org>  
> wrote:
>> For importing existing docs, I think you could just use the
>> all_or_nothing:true option and save the multiple copies of the same
>> documents and they'll all be saved, and you don't have to worry  
>> about the
>> _revisions stuff.
>>
>
> I've posted a script that copies between two running CouchDB
> instances. I'm using the all_or_nothing option. It does attachments
> inline using base64 encoding because it mostly works. I think if you
> have attachments so big that they can't be buffered, you probably want
> to avoid bulk docs anyway. If anyone desperately needs such a script
> you might be able to convince me to modify what I've written.
>
> Blog post with script and instructions here:
>
> http://jchrisa.net/drl/_design/sofa/_show/post/Upgrading%20CouchDB%20databases%20to%20trunk


http://wiki.apache.org/couchdb/BreakingChangesUpdateTrunkTo0Dot9

Please help expanding the page.

Cheers
Jan
--


Re: rep_security merge to trunk

Posted by Jan Lehnardt <ja...@apache.org>.
On 15 Mar 2009, at 04:35, Chris Anderson wrote:

> On Wed, Mar 11, 2009 at 4:51 PM, Damien Katz <da...@apache.org>  
> wrote:
>> For importing existing docs, I think you could just use the
>> all_or_nothing:true option and save the multiple copies of the same
>> documents and they'll all be saved, and you don't have to worry  
>> about the
>> _revisions stuff.
>>
>
> I've posted a script that copies between two running CouchDB
> instances. I'm using the all_or_nothing option. It does attachments
> inline using base64 encoding because it mostly works. I think if you
> have attachments so big that they can't be buffered, you probably want
> to avoid bulk docs anyway. If anyone desperately needs such a script
> you might be able to convince me to modify what I've written.
>
> Blog post with script and instructions here:
>
> http://jchrisa.net/drl/_design/sofa/_show/post/Upgrading%20CouchDB%20databases%20to%20trunk
>

Hi Chris, great work, thanks! Would it make sense to add the blog post
& script to the CouchDB wiki? I'd like to add a few notes and your blog
is still read-only :)

Cheers
Jan
--



Re: rep_security merge to trunk

Posted by Chris Anderson <jc...@apache.org>.
On Sun, Mar 15, 2009 at 7:41 AM, Chris Anderson <jc...@apache.org> wrote:
> On Sun, Mar 15, 2009 at 6:56 AM, Jeff Hinrichs - DM&T
> <du...@gmail.com> wrote:
>> On Sat, Mar 14, 2009 at 10:35 PM, Chris Anderson <jc...@apache.org> wrote:
>>> On Wed, Mar 11, 2009 at 4:51 PM, Damien Katz <da...@apache.org> wrote:
>>>> For importing existing docs, I think you could just use the
>>>> all_or_nothing:true option and save the multiple copies of the same
>>>> documents and they'll all be saved, and you don't have to worry about the
>>>> _revisions stuff.
>>>>
>>>
>>> I've posted a script that copies between two running CouchDB
>>> instances. I'm using the all_or_nothing option. It does attachments
>>> inline using base64 encoding because it mostly works. I think if you
>>> have attachments so big that they can't be buffered, you probably want
>>> to avoid bulk docs anyway. If anyone desperately needs such a script
>>> you might be able to convince me to modify what I've written.
>>>
>>> Blog post with script and instructions here:
>>>
>>> http://jchrisa.net/drl/_design/sofa/_show/post/Upgrading%20CouchDB%20databases%20to%20trunk
>>>
>>
>> Chris,
>> Does this migrate conflicted documents or does it ignore them?
>>
>
> yes, it migrates conflicts. It does document requests with
>
> GET /db/docid?open_revs=all&attachments=true
>
> which gives a copy of each doc rev leaf node (that is the head rev and
> any conflict revs).
>
> Once I figured out that the same request works for conflicted and
> normal docs, the script got much simpler.
>

I forgot to mention that it just strips the _rev from the original
documents, so in the case of conflicts the winning rev could change.

If this is unacceptable for someone's application it should be possible to fix.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: rep_security merge to trunk

Posted by Chris Anderson <jc...@apache.org>.
On Sun, Mar 15, 2009 at 6:56 AM, Jeff Hinrichs - DM&T
<du...@gmail.com> wrote:
> On Sat, Mar 14, 2009 at 10:35 PM, Chris Anderson <jc...@apache.org> wrote:
>> On Wed, Mar 11, 2009 at 4:51 PM, Damien Katz <da...@apache.org> wrote:
>>> For importing existing docs, I think you could just use the
>>> all_or_nothing:true option and save the multiple copies of the same
>>> documents and they'll all be saved, and you don't have to worry about the
>>> _revisions stuff.
>>>
>>
>> I've posted a script that copies between two running CouchDB
>> instances. I'm using the all_or_nothing option. It does attachments
>> inline using base64 encoding because it mostly works. I think if you
>> have attachments so big that they can't be buffered, you probably want
>> to avoid bulk docs anyway. If anyone desperately needs such a script
>> you might be able to convince me to modify what I've written.
>>
>> Blog post with script and instructions here:
>>
>> http://jchrisa.net/drl/_design/sofa/_show/post/Upgrading%20CouchDB%20databases%20to%20trunk
>>
>
> Chris,
> Does this migrate conflicted documents or does it ignore them?
>

yes, it migrates conflicts. It does document requests with

GET /db/docid?open_revs=all&attachments=true

which gives a copy of each doc rev leaf node (that is the head rev and
any conflict revs).

Once I figured out that the same request works for conflicted and
normal docs, the script got much simpler.

Jan, what about blog comments? ;)

-- 
Chris Anderson
http://jchris.mfdz.com

Re: rep_security merge to trunk

Posted by Jeff Hinrichs - DM&T <du...@gmail.com>.
On Sat, Mar 14, 2009 at 10:35 PM, Chris Anderson <jc...@apache.org> wrote:
> On Wed, Mar 11, 2009 at 4:51 PM, Damien Katz <da...@apache.org> wrote:
>> For importing existing docs, I think you could just use the
>> all_or_nothing:true option and save the multiple copies of the same
>> documents and they'll all be saved, and you don't have to worry about the
>> _revisions stuff.
>>
>
> I've posted a script that copies between two running CouchDB
> instances. I'm using the all_or_nothing option. It does attachments
> inline using base64 encoding because it mostly works. I think if you
> have attachments so big that they can't be buffered, you probably want
> to avoid bulk docs anyway. If anyone desperately needs such a script
> you might be able to convince me to modify what I've written.
>
> Blog post with script and instructions here:
>
> http://jchrisa.net/drl/_design/sofa/_show/post/Upgrading%20CouchDB%20databases%20to%20trunk
>

Chris,
Does this migrate conflicted documents or does it ignore them?

Regards,

Jeff Hinrichs

Re: rep_security merge to trunk

Posted by Chris Anderson <jc...@apache.org>.
On Wed, Mar 11, 2009 at 4:51 PM, Damien Katz <da...@apache.org> wrote:
> For importing existing docs, I think you could just use the
> all_or_nothing:true option and save the multiple copies of the same
> documents and they'll all be saved, and you don't have to worry about the
> _revisions stuff.
>

I've posted a script that copies between two running CouchDB
instances. I'm using the all_or_nothing option. It does attachments
inline using base64 encoding because it mostly works. I think if you
have attachments so big that they can't be buffered, you probably want
to avoid bulk docs anyway. If anyone desperately needs such a script
you might be able to convince me to modify what I've written.

Blog post with script and instructions here:

http://jchrisa.net/drl/_design/sofa/_show/post/Upgrading%20CouchDB%20databases%20to%20trunk

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: rep_security merge to trunk

Posted by Damien Katz <da...@apache.org>.
On Mar 11, 2009, at 7:07 PM, Chris Anderson wrote:

> On Wed, Mar 11, 2009 at 8:34 AM, Damien Katz <da...@apache.org>  
> wrote:
>>
>> On Mar 10, 2009, at 7:06 PM, Chris Anderson wrote:
>>
>>> On Tue, Mar 10, 2009 at 3:44 PM, Damien Katz <da...@apache.org>  
>>> wrote:
>>>>
>>>> This patch breaks the file format and replication API, so  
>>>> replication
>>>> with
>>>> earlier versions is not possible.
>>>
>>> The rev format has changed. Does this mean that migrating existing
>>> data will involve getting each doc from oldDB, stripping the _rev,  
>>> and
>>> loading it into newDB?
>>
>> Yes, but it should be possible to convert the revs to the new  
>> format too.
>> But why?
>>
>>>
>>> It should be pretty straightforward to write a Python or Ruby script
>>> that does this in bulk to transfer docs. It's essentially a  
>>> version of
>>> the python dump / load tools that doesn't require putting the  
>>> whole db
>>> on disk as an intermediary.
>>>
>>> I'll volunteer but I wonder how I should handle docs with  
>>> conflicts in
>>> the oldDB?
>>
>> Oh that's why. Using the replicator API would work for that.
>>
>
> A little confused as to the plan here. Let me try to articulate:
>
> Write a script that pulls all_docs_by_seq from the old version of
> CouchDB in batches of 1000, and for each doc loads the head rev (and
> any conflict revs) into memory.
>
> Then it creates a bulk_docs POST for those docs, by stripping the rev
> from any docs that don't have conflicts, and any docs that have
> conflicts, creating a series of revs like this (pretend there are 199
> conflict revs)
>
> 1-sdfjhgsaf
> 2-asdfkjsad
> ..
> 199-asdf7tsfd
>
> and applying the revs to each doc in the conflict set. Does the rev
> ordering matter? Assuming I don't reuse the prefix number, does the
> format/length of the second rev part matter?
>
> Then using a normal POST of an object like {"docs":[...array of
> docs...]} to the /db/_bulk_docs URL (with no special query option),
> the new docs (and conflict revs) will get stored in the new DB?
>
> Or do I need to assign well-formed made up revs to the non-conflicting
> docs (they'd all get "1-foobar") and use the ?new_edits=false option
> on the bulk_docs POST ?



To use the new_edits=false, you have to specify a rev history in a doc  
_revisions property, like this:
{new_edits:false,
  docs:[
     {_id:"foo", _revisions={start:2,ids:["133457546","475133454"]} }
     ]}

The ids are the rev ids without the leading offset, the are send this  
way for efficiency. Converting to regular revs, they would look like  
"2-133457546" and "1-475133454".

For importing existing docs, I think you could just use the  
all_or_nothing:true option and save the multiple copies of the same  
documents and they'll all be saved, and you don't have to worry about  
the _revisions stuff.

-Damien

>
> I think getting this clear on the list will help everyone's
> understanding of the new bulk_docs semantics. (I don't plan to include
> in my migrator the ability to transfer any docs which would be lost on
> the source DB during compaction... only the HEAD rev and any conflicts
> will be transfered.)
>
> Chris
>
> ps I tagged trunk as bulk_transactions (maybe coulda picked a better
> name) so we have a record of the last point of 0.9 development that
> had the old semantics. Please don't use this tag.
>
> -- 
> Chris Anderson
> http://jchris.mfdz.com


Re: rep_security merge to trunk

Posted by Chris Anderson <jc...@apache.org>.
On Wed, Mar 11, 2009 at 8:34 AM, Damien Katz <da...@apache.org> wrote:
>
> On Mar 10, 2009, at 7:06 PM, Chris Anderson wrote:
>
>> On Tue, Mar 10, 2009 at 3:44 PM, Damien Katz <da...@apache.org> wrote:
>>>
>>> This patch breaks the file format and replication API, so replication
>>> with
>>> earlier versions is not possible.
>>
>> The rev format has changed. Does this mean that migrating existing
>> data will involve getting each doc from oldDB, stripping the _rev, and
>> loading it into newDB?
>
> Yes, but it should be possible to convert the revs to the new format too.
> But why?
>
>>
>> It should be pretty straightforward to write a Python or Ruby script
>> that does this in bulk to transfer docs. It's essentially a version of
>> the python dump / load tools that doesn't require putting the whole db
>> on disk as an intermediary.
>>
>> I'll volunteer but I wonder how I should handle docs with conflicts in
>> the oldDB?
>
> Oh that's why. Using the replicator API would work for that.
>

A little confused as to the plan here. Let me try to articulate:

Write a script that pulls all_docs_by_seq from the old version of
CouchDB in batches of 1000, and for each doc loads the head rev (and
any conflict revs) into memory.

Then it creates a bulk_docs POST for those docs, by stripping the rev
from any docs that don't have conflicts, and any docs that have
conflicts, creating a series of revs like this (pretend there are 199
conflict revs)

1-sdfjhgsaf
2-asdfkjsad
..
199-asdf7tsfd

and applying the revs to each doc in the conflict set. Does the rev
ordering matter? Assuming I don't reuse the prefix number, does the
format/length of the second rev part matter?

Then using a normal POST of an object like {"docs":[...array of
docs...]} to the /db/_bulk_docs URL (with no special query option),
the new docs (and conflict revs) will get stored in the new DB?

Or do I need to assign well-formed made up revs to the non-conflicting
docs (they'd all get "1-foobar") and use the ?new_edits=false option
on the bulk_docs POST ?

I think getting this clear on the list will help everyone's
understanding of the new bulk_docs semantics. (I don't plan to include
in my migrator the ability to transfer any docs which would be lost on
the source DB during compaction... only the HEAD rev and any conflicts
will be transfered.)

Chris

ps I tagged trunk as bulk_transactions (maybe coulda picked a better
name) so we have a record of the last point of 0.9 development that
had the old semantics. Please don't use this tag.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: rep_security merge to trunk

Posted by Damien Katz <da...@apache.org>.
On Mar 10, 2009, at 7:06 PM, Chris Anderson wrote:

> On Tue, Mar 10, 2009 at 3:44 PM, Damien Katz <da...@apache.org>  
> wrote:
>>
>> This patch breaks the file format and replication API, so  
>> replication with
>> earlier versions is not possible.
>
> The rev format has changed. Does this mean that migrating existing
> data will involve getting each doc from oldDB, stripping the _rev, and
> loading it into newDB?

Yes, but it should be possible to convert the revs to the new format  
too. But why?

>
> It should be pretty straightforward to write a Python or Ruby script
> that does this in bulk to transfer docs. It's essentially a version of
> the python dump / load tools that doesn't require putting the whole db
> on disk as an intermediary.
>
> I'll volunteer but I wonder how I should handle docs with conflicts in
> the oldDB?

Oh that's why. Using the replicator API would work for that.

-Damien

Re: rep_security merge to trunk

Posted by Chris Anderson <jc...@apache.org>.
On Tue, Mar 10, 2009 at 3:44 PM, Damien Katz <da...@apache.org> wrote:
>
> This patch breaks the file format and replication API, so replication with
> earlier versions is not possible.

The rev format has changed. Does this mean that migrating existing
data will involve getting each doc from oldDB, stripping the _rev, and
loading it into newDB?

It should be pretty straightforward to write a Python or Ruby script
that does this in bulk to transfer docs. It's essentially a version of
the python dump / load tools that doesn't require putting the whole db
on disk as an intermediary.

I'll volunteer but I wonder how I should handle docs with conflicts in
the oldDB?

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Re: rep_security merge to trunk

Posted by Damien Katz <da...@apache.org>.
Heh. Hit send too soon. Anyway, I think you get the idea.

-Damien

On Mar 11, 2009, at 6:46 PM, Damien Katz wrote:

> I've added all_or_nothing transactions. It has the behavior that if  
> a document doesn't pass validation or there is a crash during  
> update, then no docs are saved. However there is no conflict  
> checking, so some of thconflicts..

>
> Adding all_or_nothing option to bulk docs. If a doc doesn't pass  
> validation or there is failure during update, then no docs are  
> saved. However, there is no conflict checking, if all docs validate  
> but some or all doc updates are conflicts, they are saved as  
> conflicts (maybe as winner or as loser) in a single transaction  
> regardless, all docs are saved and no errors returned the client.
>
> Now we just need merge to trunk.
>
> -Damien
>
>
> On Mar 11, 2009, at 11:31 AM, Damien Katz wrote:
>
>> I'm going to look into adding a "force conflicts" option today.
>>
>> -Damien
>>
>>
>> On Mar 10, 2009, at 7:01 PM, Jan Lehnardt wrote:
>>
>>>
>>> On 10 Mar 2009, at 23:44, Damien Katz wrote:
>>>
>>>> I think the rep_security branch is looking pretty solid. I still  
>>>> have work to do to merge with Adam's recent replicator changes.
>>>>
>>>> This patch breaks the file format and replication API, so  
>>>> replication with earlier versions is not possible. And the "all  
>>>> or nothing w/ conflict checking" transactions are gone. Which I  
>>>> think is good, because people were relying on it without  
>>>> understanding the rest of CouchDB doesn't support that feature.
>>>>
>>>> I'd like to go ahead and merge this to trunk. Comments,  
>>>> suggestions and objections please.
>>>
>>> I have an app that could benefit of the other variant of bulk  
>>> transactions that you offered in the initial proposal, namely  
>>> having all writes go through, regardless if they create conflicts  
>>> or not. Replication already offers this and a bulk request with  
>>> the `new_edits:false` flag set will give me that behaviour, but  
>>> not for documents that don't have a `_rev` member. CouchDB crashes  
>>> when I send it. I believe the patch to be not too hard (adding new  
>>> `_rev`s where they are missing) or I missing anything? I'm happy  
>>> to come up with a patch, if there are no objections. I also don't  
>>> think this would block merging the branch to trunk.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>
>


Re: rep_security merge to trunk

Posted by Damien Katz <da...@apache.org>.
I've added all_or_nothing transactions. It has the behavior that if a  
document doesn't pass validation or there is a crash during update,  
then no docs are saved. However there is no conflict checking, so some  
of thconflicts..

Adding all_or_nothing option to bulk docs. If a doc doesn't pass  
validation or there is failure during update, then no docs are saved.  
However, there is no conflict checking, if all docs validate but some  
or all doc updates are conflicts, they are saved as conflicts (maybe  
as winner or as loser) in a single transaction regardless, all docs  
are saved and no errors returned the client.

Now we just need merge to trunk.

-Damien


On Mar 11, 2009, at 11:31 AM, Damien Katz wrote:

> I'm going to look into adding a "force conflicts" option today.
>
> -Damien
>
>
> On Mar 10, 2009, at 7:01 PM, Jan Lehnardt wrote:
>
>>
>> On 10 Mar 2009, at 23:44, Damien Katz wrote:
>>
>>> I think the rep_security branch is looking pretty solid. I still  
>>> have work to do to merge with Adam's recent replicator changes.
>>>
>>> This patch breaks the file format and replication API, so  
>>> replication with earlier versions is not possible. And the "all or  
>>> nothing w/ conflict checking" transactions are gone. Which I think  
>>> is good, because people were relying on it without understanding  
>>> the rest of CouchDB doesn't support that feature.
>>>
>>> I'd like to go ahead and merge this to trunk. Comments,  
>>> suggestions and objections please.
>>
>> I have an app that could benefit of the other variant of bulk  
>> transactions that you offered in the initial proposal, namely  
>> having all writes go through, regardless if they create conflicts  
>> or not. Replication already offers this and a bulk request with the  
>> `new_edits:false` flag set will give me that behaviour, but not for  
>> documents that don't have a `_rev` member. CouchDB crashes when I  
>> send it. I believe the patch to be not too hard (adding new `_rev`s  
>> where they are missing) or I missing anything? I'm happy to come up  
>> with a patch, if there are no objections. I also don't think this  
>> would block merging the branch to trunk.
>>
>> Cheers
>> Jan
>> --
>>
>


Re: rep_security merge to trunk

Posted by Damien Katz <da...@apache.org>.
I'm going to look into adding a "force conflicts" option today.

-Damien


On Mar 10, 2009, at 7:01 PM, Jan Lehnardt wrote:

>
> On 10 Mar 2009, at 23:44, Damien Katz wrote:
>
>> I think the rep_security branch is looking pretty solid. I still  
>> have work to do to merge with Adam's recent replicator changes.
>>
>> This patch breaks the file format and replication API, so  
>> replication with earlier versions is not possible. And the "all or  
>> nothing w/ conflict checking" transactions are gone. Which I think  
>> is good, because people were relying on it without understanding  
>> the rest of CouchDB doesn't support that feature.
>>
>> I'd like to go ahead and merge this to trunk. Comments, suggestions  
>> and objections please.
>
> I have an app that could benefit of the other variant of bulk  
> transactions that you offered in the initial proposal, namely having  
> all writes go through, regardless if they create conflicts or not.  
> Replication already offers this and a bulk request with the  
> `new_edits:false` flag set will give me that behaviour, but not for  
> documents that don't have a `_rev` member. CouchDB crashes when I  
> send it. I believe the patch to be not too hard (adding new `_rev`s  
> where they are missing) or I missing anything? I'm happy to come up  
> with a patch, if there are no objections. I also don't think this  
> would block merging the branch to trunk.
>
> Cheers
> Jan
> --
>


Re: rep_security merge to trunk

Posted by Jan Lehnardt <ja...@apache.org>.
On 10 Mar 2009, at 23:44, Damien Katz wrote:

> I think the rep_security branch is looking pretty solid. I still  
> have work to do to merge with Adam's recent replicator changes.
>
> This patch breaks the file format and replication API, so  
> replication with earlier versions is not possible. And the "all or  
> nothing w/ conflict checking" transactions are gone. Which I think  
> is good, because people were relying on it without understanding the  
> rest of CouchDB doesn't support that feature.
>
> I'd like to go ahead and merge this to trunk. Comments, suggestions  
> and objections please.

I have an app that could benefit of the other variant of bulk  
transactions that you offered in the initial proposal, namely having  
all writes go through, regardless if they create conflicts or not.  
Replication already offers this and a bulk request with the  
`new_edits:false` flag set will give me that behaviour, but not for  
documents that don't have a `_rev` member. CouchDB crashes when I send  
it. I believe the patch to be not too hard (adding new `_rev`s where  
they are missing) or I missing anything? I'm happy to come up with a  
patch, if there are no objections. I also don't think this would block  
merging the branch to trunk.

Cheers
Jan
--