You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@openoffice.apache.org by Greg Stein <gs...@gmail.com> on 2011/06/29 05:27:03 UTC

Building a single Hg repository (was: An svn question)

On Mon, Jun 27, 2011 at 05:42, Jens-Heiner Rechtien <jh...@web.de> wrote:
> On 06/27/2011 01:08 AM, Greg Stein wrote:
>...
>>> Merging them in hg is easy, just pull/merge. But ... we are talking about
>>> a
>>> hundred or so CWSs here. In all kinds of readiness states.
>>> http://hg.services.openoffice.org
>>>
>>> If we merge them now, we won't have a working OOo for a long time. Now,
>>> we
>>> could skip the merge part and leave the heads "dangling". Hg heads are
>>> kinda
>>
>> That's what I was thinking. And then map these "dangling" heads to
>> individual branches in svn.
>>
>>> anonymous branches in Mercurial. Don't know if a repository with multiple
>>> heads can be converted to SVN. Probably quite tricky (the tool would need
>>> to
>>> generate sensible names for the different heads).
>>
>> If the converter tool doesn't have the feature, it seems pretty
>> straight-forward to add code to either provide a name mapping for
>> them, or auto-generate names.
>
> The anonymous heads could be marked with the cws name as a mercurial
> bookmark, just after the individual pull step. That way the information is
> at least already in the all-in-one hg repository. A smart converter could
> use them to generate svn branch names. Something along this lines:
>
> $ cd <all-in-one-respository>
> $ hg pull ../cws/os151
>
> ... the latest changeset of CWS os151 is now tip
>
> $ hg bookmark -r tip os151
>
> $ hg bookmarks
>   os151                     276718:f4d674e63830
>   ....

Great. Thanks for the pointers.

I'm going to start updating the single-hg.sh (see tools/dev/) with
this stuff. I'd appreciate if you could keep an eye on the commits and
correct me where I stray. I've never used Mercurial before.

I've read up on the difference between: tags, branches, local tags,
and bookmarks. I agree that it seems bookmarks are appropriate for
this purpose. We could technically use a tag since no further work
would be done in this "single" repository. However, we may be able to
use the bookmarks as an indicator for branch construction (vs a static
copy in tags/).

I suspect that a lot of work will happen on the Convert extension to
Mercurial to manage this transition :-)

One more thing... I cloned one of the CWSs (ab78), and it was 2.8 Gb.
My clone of DEV300 is 3.5 Gb. Is the size of that CWS typical? There
are about 250 CWSs hosted at OOo. If the average holds, I would need
to clone 700 Gb of material down to my system to perform the
integration.

Am I missing something? Is there a better way? etc.

Thanks,
-g

Re: Building a single Hg repository (was: An svn question)

Posted by Greg Stein <gs...@gmail.com>.

On Tue, Jun 28, 2011 at 23:27, Greg Stein <gs...@gmail.com> wrote:
> On Mon, Jun 27, 2011 at 05:42, Jens-Heiner Rechtien <jh...@web.de> wrote:
>> On 06/27/2011 01:08 AM, Greg Stein wrote:
>>...
>>>> Merging them in hg is easy, just pull/merge. But ... we are talking about
>>>> a
>>>> hundred or so CWSs here. In all kinds of readiness states.
>>>> http://hg.services.openoffice.org
>>>>
>>>> If we merge them now, we won't have a working OOo for a long time. Now,
>>>> we
>>>> could skip the merge part and leave the heads "dangling". Hg heads are
>>>> kinda
>>>
>>> That's what I was thinking. And then map these "dangling" heads to
>>> individual branches in svn.
>>>
>>>> anonymous branches in Mercurial. Don't know if a repository with multiple
>>>> heads can be converted to SVN. Probably quite tricky (the tool would need
>>>> to
>>>> generate sensible names for the different heads).
>>>
>>> If the converter tool doesn't have the feature, it seems pretty
>>> straight-forward to add code to either provide a name mapping for
>>> them, or auto-generate names.
>>
>> The anonymous heads could be marked with the cws name as a mercurial
>> bookmark, just after the individual pull step. That way the information is
>> at least already in the all-in-one hg repository. A smart converter could
>> use them to generate svn branch names. Something along this lines:
>>
>> $ cd <all-in-one-respository>
>> $ hg pull ../cws/os151
>>
>> ... the latest changeset of CWS os151 is now tip
>>
>> $ hg bookmark -r tip os151
>>
>> $ hg bookmarks
>>   os151                     276718:f4d674e63830
>>   ....
>
> Great. Thanks for the pointers.
>
> I'm going to start updating the single-hg.sh (see tools/dev/) with
> this stuff. I'd appreciate if you could keep an eye on the commits and
> correct me where I stray. I've never used Mercurial before.
>
> I've read up on the difference between: tags, branches, local tags,
> and bookmarks. I agree that it seems bookmarks are appropriate for
> this purpose. We could technically use a tag since no further work
> would be done in this "single" repository. However, we may be able to
> use the bookmarks as an indicator for branch construction (vs a static
> copy in tags/).
>
> I suspect that a lot of work will happen on the Convert extension to
> Mercurial to manage this transition :-)
>
> One more thing... I cloned one of the CWSs (ab78), and it was 2.8 Gb.
> My clone of DEV300 is 3.5 Gb. Is the size of that CWS typical? There
> are about 250 CWSs hosted at OOo. If the average holds, I would need
> to clone 700 Gb of material down to my system to perform the
> integration.
>
> Am I missing something? Is there a better way? etc.

For the size issue, I've been pointed at:
http://mercurial.selenic.com/wiki/RelinkExtension


Cheers,
-g

Re: Building a single Hg repository (was: An svn question)

Posted by Mathias Bauer <Ma...@gmx.net>.

On 29.06.2011 12:17, Greg Stein wrote:

> I don't understand this part. DEV300 is the master repo, right? Are
> you saying that there is a *separate* repository for the l10n data?

Yes. we have separated the localization repos from the "other code" 
repos. There have been many good reasons to do that, but most probably 
it doesn't make a lot of sense to repeat them here.

IMHO we don't need the history for l10n files. So the easiest way to get 
that these files would be to take a source tarball and commit it to svn.

We can do that later, the l10n files are not needed to do a build. The 
build result will be en-US only then.

Regards,
Mathias

Re: Building a single Hg repository (was: An svn question)

Posted by Greg Stein <gs...@gmail.com>.

On Jun 29, 2011 8:38 AM, "Jens-Heiner Rechtien" <jh...@web.de> wrote:
>
> On 06/29/2011 12:17 PM, Greg Stein wrote:
>
> [... snip ...]
>
>
>> Um. I see kind of a pot shot at svn here. I'll give you the benefit of
>> the doubt, rather than get cranky. The local pristines (beyond just
>> diff) mean that commits can send deltas, rather than the whole file.
>> And when you're working with 4G files (oh, wait! Hg can't deal with
>> files that size!) then sending a delta is very important.
>
>
> Of course, anyone who actually commits 4G files to a *source* code
repository should be tarred and feathered.

Next time you play Quake, send a "thanks" to Subversion. id Software has
been using svn for years to store image and video assets. Subversion was
*designed* for more uses than just us code monkeys.

I could name others, but it would be tiresome. Fact is that svn can store
huge amounts of data. And people use that feature. Daily.

-g

Re: Building a single Hg repository (was: An svn question)

Posted by Jens-Heiner Rechtien <jh...@web.de>.

On 06/29/2011 12:17 PM, Greg Stein wrote:

[... snip ...]

> Um. I see kind of a pot shot at svn here. I'll give you the benefit of
> the doubt, rather than get cranky. The local pristines (beyond just
> diff) mean that commits can send deltas, rather than the whole file.
> And when you're working with 4G files (oh, wait! Hg can't deal with
> files that size!) then sending a delta is very important.

Of course, anyone who actually commits 4G files to a *source* code 
repository should be tarred and feathered.

[... snip ...]

Heiner


-- 
Jens-Heiner Rechtien

Re: Building a single Hg repository (was: An svn question)

Posted by Michael Stahl <ms...@openoffice.org>.

On 29.06.2011 12:17, Greg Stein wrote:
> On Wed, Jun 29, 2011 at 05:04, Michael Stahl<ms...@openoffice.org>  wrote:
>> On 29.06.2011 05:27, Greg Stein wrote:

>> but HG supports hardlinks between repositories (in newer versions even on
>> win32), so you can "hg clone" the master on the same filesystem and then
>> pull in the CWS, and it will be _much_ faster and take _much_ less
>
> Yah. This is awesome, and will make pulling CWSs much quicker. I'll
> bake that into our scripts.
>
>> additional space (in fact, less than the useful-only-for-diff "pristine
>> source" in a SVN working copy would take).
>
> Um. I see kind of a pot shot at svn here. I'll give you the benefit of
> the doubt, rather than get cranky. The local pristines (beyond just
> diff) mean that commits can send deltas, rather than the whole file.
> And when you're working with 4G files (oh, wait! Hg can't deal with
> files that size!) then sending a delta is very important.

i think usually HG stores and transmits binary diffs for changes.

but you're right, SVN isn't totally useless: the remaining niche where 
SVN actually has an advantage over modern DSCMs is for versioning huge 
binary files: you don't want to have all revisions of those in a working 
copy.

we worked around that limitation by storing our binary blobs (i.e., 
external library tarballs) on a FTP server (see fetch_tarballs.sh).

>> oh, just noticed it doesn't include all the l10n repositories.
>> i think we need those as well.
>> with Branchmirror probably a second config file is required, because l10n is
>> a separate master repo.
>> (since DEV300m101 a master/CWS consists of 2 repositories, one for all the
>> bulky translations, one for the stuff i work on :)
>
> I don't understand this part. DEV300 is the master repo, right? Are
> you saying that there is a *separate* repository for the l10n data?

yes, exactly.
the l10n stuff has huge changes, and developers don't ever need it, so 
we complained about all this wasted hard disk space/bandwidth, and now 
finally releng gave us 2 master repos :)

these are the master l10n repos:

master_l10n/DEV300
master_l10n/OOO340

for CWS it looks like this (guess most of these won't contain changes):

cws_l10n/sw34bf06

-- 
"I suppose I should learn Lisp, but it seems so foreign."
  -- Paul Graham, Nov 1983

Re: Building a single Hg repository (was: An svn question)

Posted by Greg Stein <gs...@gmail.com>.

On Wed, Jun 29, 2011 at 05:04, Michael Stahl <ms...@openoffice.org> wrote:
> On 29.06.2011 05:27, Greg Stein wrote:
>...
>> One more thing... I cloned one of the CWSs (ab78), and it was 2.8 Gb.
>> My clone of DEV300 is 3.5 Gb. Is the size of that CWS typical? There
>> are about 250 CWSs hosted at OOo. If the average holds, I would need
>> to clone 700 Gb of material down to my system to perform the
>> integration.
>
> i guess your DEV300 includes a working copy, and ab78 does not?
> "du" says 2.4 GB for .hg on ext3 filesystem here.

Nope. I also had a full working copy :-) ... it wasn't until later
that I learned about 'hg clone -U'.

I'm a total n00b with Hg. heh.

>> Am I missing something? Is there a better way? etc.
>
> you're doing it wrong :)

Thought so. I jumped onto the #mercurial channel and spoke with a
couple people there. In just that short time, I learned quite a bit.
Specifically, the hardlinks that you mentioned, along with the relink
extension.

> in principle the size of a CWS is on the same order as the master, because
> it's just another HG repository.

Right. If you link them together, which I didn't understand how to do.
(but have now learned)

> but HG supports hardlinks between repositories (in newer versions even on
> win32), so you can "hg clone" the master on the same filesystem and then
> pull in the CWS, and it will be _much_ faster and take _much_ less

Yah. This is awesome, and will make pulling CWSs much quicker. I'll
bake that into our scripts.

> additional space (in fact, less than the useful-only-for-diff "pristine
> source" in a SVN working copy would take).

Um. I see kind of a pot shot at svn here. I'll give you the benefit of
the doubt, rather than get cranky. The local pristines (beyond just
diff) mean that commits can send deltas, rather than the whole file.
And when you're working with 4G files (oh, wait! Hg can't deal with
files that size!) then sending a delta is very important.

> there is an extension written by my former colleague Bjoern Michaelsen that
> can mirror all the CWSes automatically:
>
> http://mercurial.selenic.com/wiki/BranchmirrorExtension
>
> IIRC all CWSes that actually include changesets not in the master take less
> than 100GB.
> only issue is that Branchmirror does not check "hg incoming" before cloning
> for a CWS, so you end up with some useless repos identical to master.

Cool. I'll take a look at this. Maybe this will be important for our
conversion scripts. I'm still learning while I assemble that stuff.
All this help is awesome, as I really don't know Best Practices for
Hg.

> i'll attach the .hgrc i used; it excludes a lot of CWSes that are marked as
> "integrated" or "deleted" in EIS (which is a database and a web UI to manage
> CWS metadata); these are also automatically deleted on the HG server after
> some time.

I've checked in a list of all the CWSs from the Oracle repository. If
there are some CWSs that we *know* that we don't want, then please
comment them out from that file (and preferably, with a short
explanation why). That will definitely help the overall conversion
process, if we don't have to process a bunch of the CWS repositories.

> oh, just noticed it doesn't include all the l10n repositories.
> i think we need those as well.
> with Branchmirror probably a second config file is required, because l10n is
> a separate master repo.
> (since DEV300m101 a master/CWS consists of 2 repositories, one for all the
> bulky translations, one for the stuff i work on :)

I don't understand this part. DEV300 is the master repo, right? Are
you saying that there is a *separate* repository for the l10n data?

> of course cloning all the CWSes individually is different from what Heiner
> suggested above, but i think it's useful as a backup, and you can experiment
> much better if you have this as an intermediate step and don't have to
> download everything again.

Right. The script that I've started assumes you've cloned all of these
repositories locally. We need to be able to work through this process
as a community. That means developing some scripts so that *everybody*
can replicate what we're going to extract from the Oracle repositories
and import into Apache.

> my totally unsubstantiated guess is that one HG repo with all CWSes pulled
> in would be ~3 GB.

Wow. Cool. I was very worried about total space for these things.
Keeping it to repository-only (eg. "clone -U") and ensuring hardlinks
are used, then yah: space and time should be greatly reduced.

I appreciate the pointers! The problem seems much more approachable.

Cheers,
-g

Re: Building a single Hg repository (was: An svn question)

Posted by Michael Stahl <ms...@openoffice.org>.

On 29.06.2011 05:27, Greg Stein wrote:
> On Mon, Jun 27, 2011 at 05:42, Jens-Heiner Rechtien<jh...@web.de>  wrote:
>> On 06/27/2011 01:08 AM, Greg Stein wrote:
>> ...
>> The anonymous heads could be marked with the cws name as a mercurial
>> bookmark, just after the individual pull step. That way the information is
>> at least already in the all-in-one hg repository. A smart converter could
>> use them to generate svn branch names. Something along this lines:
>>
>> $ cd<all-in-one-respository>
>> $ hg pull ../cws/os151
>>
>> ... the latest changeset of CWS os151 is now tip
>>
>> $ hg bookmark -r tip os151
>>
>> $ hg bookmarks
>>    os151                     276718:f4d674e63830
>>    ....
>
> Great. Thanks for the pointers.
>
> I'm going to start updating the single-hg.sh (see tools/dev/) with
> this stuff. I'd appreciate if you could keep an eye on the commits and
> correct me where I stray. I've never used Mercurial before.
>
> I've read up on the difference between: tags, branches, local tags,
> and bookmarks. I agree that it seems bookmarks are appropriate for
> this purpose. We could technically use a tag since no further work
> would be done in this "single" repository. However, we may be able to
> use the bookmarks as an indicator for branch construction (vs a static
> copy in tags/).
>
> I suspect that a lot of work will happen on the Convert extension to
> Mercurial to manage this transition :-)
>
> One more thing... I cloned one of the CWSs (ab78), and it was 2.8 Gb.
> My clone of DEV300 is 3.5 Gb. Is the size of that CWS typical? There
> are about 250 CWSs hosted at OOo. If the average holds, I would need
> to clone 700 Gb of material down to my system to perform the
> integration.

i guess your DEV300 includes a working copy, and ab78 does not?
"du" says 2.4 GB for .hg on ext3 filesystem here.

> Am I missing something? Is there a better way? etc.

you're doing it wrong :)

in principle the size of a CWS is on the same order as the master, 
because it's just another HG repository.

but HG supports hardlinks between repositories (in newer versions even 
on win32), so you can "hg clone" the master on the same filesystem and 
then pull in the CWS, and it will be _much_ faster and take _much_ less 
additional space (in fact, less than the useful-only-for-diff "pristine 
source" in a SVN working copy would take).

there is an extension written by my former colleague Bjoern Michaelsen 
that can mirror all the CWSes automatically:

http://mercurial.selenic.com/wiki/BranchmirrorExtension

IIRC all CWSes that actually include changesets not in the master take 
less than 100GB.
only issue is that Branchmirror does not check "hg incoming" before 
cloning for a CWS, so you end up with some useless repos identical to 
master.

i'll attach the .hgrc i used; it excludes a lot of CWSes that are marked 
as "integrated" or "deleted" in EIS (which is a database and a web UI to 
manage CWS metadata); these are also automatically deleted on the HG 
server after some time.

oh, just noticed it doesn't include all the l10n repositories.
i think we need those as well.
with Branchmirror probably a second config file is required, because 
l10n is a separate master repo.
(since DEV300m101 a master/CWS consists of 2 repositories, one for all 
the bulky translations, one for the stuff i work on :)

of course cloning all the CWSes individually is different from what 
Heiner suggested above, but i think it's useful as a backup, and you can 
experiment much better if you have this as an intermediate step and 
don't have to download everything again.

my totally unsubstantiated guess is that one HG repo with all CWSes 
pulled in would be ~3 GB.

regards,
  michael

-- 
 From the Plan 9 FAQ:
Q: Where did the name come from?
A: It was chosen in the Bell Labs tradition of selecting
    names that make marketeers wince.

Re: Building a single Hg repository (was: An svn question)

Posted by Jens-Heiner Rechtien <jh...@web.de>.

Hi Greg,

On 06/29/2011 05:27 AM, Greg Stein wrote:
> On Mon, Jun 27, 2011 at 05:42, Jens-Heiner Rechtien<jh...@web.de>  wrote:
>> On 06/27/2011 01:08 AM, Greg Stein wrote:
>> ...
>>>> Merging them in hg is easy, just pull/merge. But ... we are talking about
>>>> a
>>>> hundred or so CWSs here. In all kinds of readiness states.
>>>> http://hg.services.openoffice.org
>>>>
>>>> If we merge them now, we won't have a working OOo for a long time. Now,
>>>> we
>>>> could skip the merge part and leave the heads "dangling". Hg heads are
>>>> kinda
>>>
>>> That's what I was thinking. And then map these "dangling" heads to
>>> individual branches in svn.
>>>
>>>> anonymous branches in Mercurial. Don't know if a repository with multiple
>>>> heads can be converted to SVN. Probably quite tricky (the tool would need
>>>> to
>>>> generate sensible names for the different heads).
>>>
>>> If the converter tool doesn't have the feature, it seems pretty
>>> straight-forward to add code to either provide a name mapping for
>>> them, or auto-generate names.
>>
>> The anonymous heads could be marked with the cws name as a mercurial
>> bookmark, just after the individual pull step. That way the information is
>> at least already in the all-in-one hg repository. A smart converter could
>> use them to generate svn branch names. Something along this lines:
>>
>> $ cd<all-in-one-respository>
>> $ hg pull ../cws/os151
>>
>> ... the latest changeset of CWS os151 is now tip
>>
>> $ hg bookmark -r tip os151
>>
>> $ hg bookmarks
>>    os151                     276718:f4d674e63830
>>    ....
>
> Great. Thanks for the pointers.
>
> I'm going to start updating the single-hg.sh (see tools/dev/) with
> this stuff. I'd appreciate if you could keep an eye on the commits and
> correct me where I stray. I've never used Mercurial before.

Will do.

>
> I've read up on the difference between: tags, branches, local tags,
> and bookmarks. I agree that it seems bookmarks are appropriate for
> this purpose. We could technically use a tag since no further work
> would be done in this "single" repository. However, we may be able to
> use the bookmarks as an indicator for branch construction (vs a static
> copy in tags/).
>
> I suspect that a lot of work will happen on the Convert extension to
> Mercurial to manage this transition :-)
>
> One more thing... I cloned one of the CWSs (ab78), and it was 2.8 Gb.
> My clone of DEV300 is 3.5 Gb. Is the size of that CWS typical? There
> are about 250 CWSs hosted at OOo. If the average holds, I would need
> to clone 700 Gb of material down to my system to perform the
> integration.

Several posters already answered this one, just let me add that the 
first 263206 revisions on all repositories are guaranteed identical, no 
need to ever download them twice over the wire. Actually almost all the 
open CWSs contain only very few new changesets, and that's all you need 
to pull.

If you got a copy of DEV300 or OOO340 on your system you could either 
pull all the stuff in one repository resulting in having one head per 
CWS or create one repository per CWS with:

$ hg clone -U DEV300 <foo>   <- uses hardlinks on Unix
$ cd foo
$ hg pull .../cws/<foo

Which approach is better depends on how you want to proceed.

The OOo mercurial server hg.services.openoffice.org keeps all the 
currently open CWSs in less than 400 GB diskspace, but there hardlinks 
are only used during CWS creation. With perfect hardlinking it will be 
significantly less. An all in one repository will be around 2.5 GiB plus 
1.7 GiB or so for the working tree.

Heiner

-- 
Jens-Heiner Rechtien

Re: Building a single Hg repository

Posted by Mathias Bauer <Ma...@gmx.net>.

On 29.06.2011 12:59, Michael Stahl wrote:
> On 29.06.2011 12:29, Mathias Bauer wrote:
>> As your local hg repo is just an intermediate step from where you export
>> to svn, you could pull all cws into one repository. The majority of the
>> different repositories (master or cws) consists of almost the same
>> changesets, so pulling into one repo will save a lot of disk space.
>>
>> This has been tried by several people and it worked for them.
>>
>> Someone has created a Mercurial extension that makes this process pretty
>> easy:
>>
>> http://mercurial.selenic.com/wiki/BranchmirrorExtension
>>
>> As some of our cws are based on the DEV300 repo and some are based on
>> the OOo340 repo, you will need two repos or someone needs to copy the
>> cws from dev300 to ooo340 before.
>
> actually OOO340 should be sufficient, because it contains all changes
> from DEV300.
> why would copying to ooo340 (what do you mean by that? merging in
> OOO340?) be required?

You are right. As OOO340 contains everything from DEV300, there 
shouldn't be a problem.

Regards,
Mathias

Re: Building a single Hg repository

Posted by Michael Stahl <ms...@openoffice.org>.

On 29.06.2011 12:29, Mathias Bauer wrote:
> As your local hg repo is just an intermediate step from where you export
> to svn, you could pull all cws into one repository. The majority of the
> different repositories (master or cws) consists of almost the same
> changesets, so pulling into one repo will save a lot of disk space.
>
> This has been tried by several people and it worked for them.
>
> Someone has created a Mercurial extension that makes this process pretty
> easy:
>
> http://mercurial.selenic.com/wiki/BranchmirrorExtension
>
> As some of our cws are based on the DEV300 repo and some are based on
> the OOo340 repo, you will need two repos or someone needs to copy the
> cws from dev300 to ooo340 before.

actually OOO340 should be sufficient, because it contains all changes 
from DEV300.
why would copying to ooo340 (what do you mean by that? merging in 
OOO340?) be required?

if you pull a CWS into a OOO340 clone you may have 2 heads (the OOO340 
head and the CWS head), but if the CWS is not rebased to the latest 
DEV300 milestone+masterfixes, you have the _exact_ same problem if you 
pull into a DEV300 clone.

and in any case, pulling all (OOO340 clone+CWS pull) repos into a single 
OOO340 repo should add +1 head per CWS, which could be (in principle) 
converted to a SVN branch.

-- 
recursion, n:
         See recursion.

Re: Building a single Hg repository

Posted by Mathias Bauer <Ma...@gmx.net>.

On 29.06.2011 05:27, Greg Stein wrote:
> On Mon, Jun 27, 2011 at 05:42, Jens-Heiner Rechtien<jh...@web.de>  wrote:
>> On 06/27/2011 01:08 AM, Greg Stein wrote:
>> ...
>>>> Merging them in hg is easy, just pull/merge. But ... we are talking about
>>>> a
>>>> hundred or so CWSs here. In all kinds of readiness states.
>>>> http://hg.services.openoffice.org
>>>>
>>>> If we merge them now, we won't have a working OOo for a long time. Now,
>>>> we
>>>> could skip the merge part and leave the heads "dangling". Hg heads are
>>>> kinda
>>>
>>> That's what I was thinking. And then map these "dangling" heads to
>>> individual branches in svn.
>>>
>>>> anonymous branches in Mercurial. Don't know if a repository with multiple
>>>> heads can be converted to SVN. Probably quite tricky (the tool would need
>>>> to
>>>> generate sensible names for the different heads).
>>>
>>> If the converter tool doesn't have the feature, it seems pretty
>>> straight-forward to add code to either provide a name mapping for
>>> them, or auto-generate names.
>>
>> The anonymous heads could be marked with the cws name as a mercurial
>> bookmark, just after the individual pull step. That way the information is
>> at least already in the all-in-one hg repository. A smart converter could
>> use them to generate svn branch names. Something along this lines:
>>
>> $ cd<all-in-one-respository>
>> $ hg pull ../cws/os151
>>
>> ... the latest changeset of CWS os151 is now tip
>>
>> $ hg bookmark -r tip os151
>>
>> $ hg bookmarks
>>    os151                     276718:f4d674e63830
>>    ....
>
> Great. Thanks for the pointers.
>
> I'm going to start updating the single-hg.sh (see tools/dev/) with
> this stuff. I'd appreciate if you could keep an eye on the commits and
> correct me where I stray. I've never used Mercurial before.
>
> I've read up on the difference between: tags, branches, local tags,
> and bookmarks. I agree that it seems bookmarks are appropriate for
> this purpose. We could technically use a tag since no further work
> would be done in this "single" repository. However, we may be able to
> use the bookmarks as an indicator for branch construction (vs a static
> copy in tags/).
>
> I suspect that a lot of work will happen on the Convert extension to
> Mercurial to manage this transition :-)
>
> One more thing... I cloned one of the CWSs (ab78), and it was 2.8 Gb.
> My clone of DEV300 is 3.5 Gb. Is the size of that CWS typical? There
> are about 250 CWSs hosted at OOo. If the average holds, I would need
> to clone 700 Gb of material down to my system to perform the
> integration.
>
> Am I missing something? Is there a better way? etc.

As your local hg repo is just an intermediate step from where you export 
to svn, you could pull all cws into one repository. The majority of the 
different repositories (master or cws) consists of almost the same 
changesets, so pulling into one repo will save a lot of disk space.

This has been tried by several people and it worked for them.

Someone has created a Mercurial extension that makes this process pretty 
easy:

http://mercurial.selenic.com/wiki/BranchmirrorExtension

As some of our cws are based on the DEV300 repo and some are based on 
the OOo340 repo, you will need two repos or someone needs to copy the 
cws from dev300 to ooo340 before.

Regards,
Mathias