You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by Christian Lohmaier <cl...@openoffice.org> on 2011/07/28 07:11:31 UTC

Converting the repo using mercurial's convert extension (was: OOO340 to svn)

Hi *,

On Wed, Jul 27, 2011 at 10:14 PM, Rob Weir <ap...@robweir.com> wrote:
> Can we review what has been attempted and what has failed?  I want to
> make sure we understand the dead-ends, so we don't retrace them.
> [...]
> 2) I assume someone has tried the Hg "convert" extensions, e.g.,  hg
> convert --dest-type svn hgreponame svnreponame .  If so, what didn't
> work right?

I just tried it, and indeed it fails right from the start, but it is
easily bypassed.

It fails because the very first commit is completely empty, and the
conversion is not able to deal with it. Solution is to map the 0
revision to svn's 0 revision - this has the additional benefit that
for the linear part, the svn-revs do match the hg-rev-numbers (but it
is of little practical use unfortunately). See below for details.

But the important part is:
* You absolutely must do the conversion in a RAM-disk (or at least on
a fast disk), as it is I/O bound as it replays every changeset (i.e.
changes files on disk, svn adds additional files, svn commits the
changes, with the changelog put into a temporary file, append
revisions to the revision-map)
* You can resume the conversion, do it in parts, can do it in parts,
as the conversion creates a mapping of hg-node to svn revision
* lacking a machine with enough RAM or a fast disk, I did not try to
reach the interesting parts, documentation of the convert extension
reads that history on branches will be lost. Better than nothing, but
maybe there's a better option

##################### walkthrough #####################
$ hg --time convert -s hg -d svn -r 1000 /tmp/DEV300 /tmp/conversionrepo

fails with "IndexError: list index out of range", but creates the
svn-repo and working copy

$ svn info /tmp/conversionrepo-wc/

retrieve the svn-repo's UUID to be used in the next step, in this case it's
5ba95dcc-b8db-11e0-acda-05b8bd7194b6

$ echo "70e04cda304fa41981bdc351069f334f55598645
svn:5ba95dcc-b8db-11e0-acda-05b8bd7194b6@0" >
/tmp/conversionrepo-wc/.svn/hg-shamap

maps the very first revision of the hg-repo to the 0 revision of the
svn-repo, i.e. make the conversion believe it did already handle
it/skip it on the next try [1]
for sake of completeness on how to get the hash of the first hg revision:
hg --cwd /tmp/DEV300 log --template "{node}\n" -r 0

$ hg --time convert -s hg -d svn -r 1000 /tmp/DEV300 /tmp/conversionrepo

will now succeed (if you don't like a failing command, you can also
use svnadmin yourself to create an empty repo and clone it to a
working-copy by appending "-wc" to the repo's name, has the same
effect.

In the example, the conversion is done up to revision 1000 (use this
for timings/over-the-thumb maths whether it is suitable or not)

(it goes well with the progress extension, that way you get a
"countdown" of remaining revisions, and also an estimate of the
complete conversion)

Initial revisions add lots of files, and with that it showed an
estimate of 4 hours, but that quickly dropped to 3 - so 2½ to 3 hours
for 1000 changesets compared to 2 hours for 500... Will f'up later
with the effective time (if my connection holds, stupid me did not
start it in screen)

Now if Florent could compare this method on his hardware we would have
comparable numbers/a direct comparison.

[1] Note that with the map, it would also be possible to reuse the old
OOo-Subversion repo for the linear commits, after all the hg repo was
a conversion from the svn server. This would save quite a bit of time.

ciao
Christian

Re: Converting the repo using mercurial's convert extension (was: OOO340 to svn)

Posted by florent andré <fl...@4sengines.com>.
Thanks Christian for investigating this !

As I have to go out for 5 days and my machine already run another 
conversion test, I will not be able to try is sooner... and may a 
solution was found until then ! :)

++




On 07/28/2011 07:11 AM, Christian Lohmaier wrote:
> Hi *,
>
> On Wed, Jul 27, 2011 at 10:14 PM, Rob Weir<ap...@robweir.com>  wrote:
>> Can we review what has been attempted and what has failed?  I want to
>> make sure we understand the dead-ends, so we don't retrace them.
>> [...]
>> 2) I assume someone has tried the Hg "convert" extensions, e.g.,  hg
>> convert --dest-type svn hgreponame svnreponame .  If so, what didn't
>> work right?
>
> I just tried it, and indeed it fails right from the start, but it is
> easily bypassed.
>
> It fails because the very first commit is completely empty, and the
> conversion is not able to deal with it. Solution is to map the 0
> revision to svn's 0 revision - this has the additional benefit that
> for the linear part, the svn-revs do match the hg-rev-numbers (but it
> is of little practical use unfortunately). See below for details.
>
> But the important part is:
> * You absolutely must do the conversion in a RAM-disk (or at least on
> a fast disk), as it is I/O bound as it replays every changeset (i.e.
> changes files on disk, svn adds additional files, svn commits the
> changes, with the changelog put into a temporary file, append
> revisions to the revision-map)
> * You can resume the conversion, do it in parts, can do it in parts,
> as the conversion creates a mapping of hg-node to svn revision
> * lacking a machine with enough RAM or a fast disk, I did not try to
> reach the interesting parts, documentation of the convert extension
> reads that history on branches will be lost. Better than nothing, but
> maybe there's a better option
>
> ##################### walkthrough #####################
> $ hg --time convert -s hg -d svn -r 1000 /tmp/DEV300 /tmp/conversionrepo
>
> fails with "IndexError: list index out of range", but creates the
> svn-repo and working copy
>
> $ svn info /tmp/conversionrepo-wc/
>
> retrieve the svn-repo's UUID to be used in the next step, in this case it's
> 5ba95dcc-b8db-11e0-acda-05b8bd7194b6
>
> $ echo "70e04cda304fa41981bdc351069f334f55598645
> svn:5ba95dcc-b8db-11e0-acda-05b8bd7194b6@0">
> /tmp/conversionrepo-wc/.svn/hg-shamap
>
> maps the very first revision of the hg-repo to the 0 revision of the
> svn-repo, i.e. make the conversion believe it did already handle
> it/skip it on the next try [1]
> for sake of completeness on how to get the hash of the first hg revision:
> hg --cwd /tmp/DEV300 log --template "{node}\n" -r 0
>
> $ hg --time convert -s hg -d svn -r 1000 /tmp/DEV300 /tmp/conversionrepo
>
> will now succeed (if you don't like a failing command, you can also
> use svnadmin yourself to create an empty repo and clone it to a
> working-copy by appending "-wc" to the repo's name, has the same
> effect.
>
> In the example, the conversion is done up to revision 1000 (use this
> for timings/over-the-thumb maths whether it is suitable or not)
>
> (it goes well with the progress extension, that way you get a
> "countdown" of remaining revisions, and also an estimate of the
> complete conversion)
>
> Initial revisions add lots of files, and with that it showed an
> estimate of 4 hours, but that quickly dropped to 3 - so 2½ to 3 hours
> for 1000 changesets compared to 2 hours for 500... Will f'up later
> with the effective time (if my connection holds, stupid me did not
> start it in screen)
>
> Now if Florent could compare this method on his hardware we would have
> comparable numbers/a direct comparison.
>
> [1] Note that with the map, it would also be possible to reuse the old
> OOo-Subversion repo for the linear commits, after all the hg repo was
> a conversion from the svn server. This would save quite a bit of time.
>
> ciao
> Christian

Re: Converting the repo using mercurial's convert extension

Posted by Christian Lohmaier <cl...@openoffice.org>.
On Tue, Aug 2, 2011 at 9:12 PM, Christian Lohmaier <cl...@openoffice.org> wrote:
> [...]
>> can you list the steps you are following in more detail (even a dump of your
>> term history) and upload the scripts to svn?
>
> I won't upload to svn, as I'm not a committer (and have no intentions
> to sign the iCLA)
>
> But I pasted it here
> http://pastie.org/2310454

Oh, I forgot: While I won't upload it to svn myself, I don't claim any
copyright on this bit of info, so feel free to take it as
Apace-Licensed, LGPL, MPL in whatever present or future versions, as
Public Domain - you get the idea.

ciao
Christian

Re: Converting the repo using mercurial's convert extension

Posted by Christian Lohmaier <cl...@openoffice.org>.
Hi Andrew, *,

On Tue, Aug 2, 2011 at 7:27 PM, Andrew Rist <an...@oracle.com> wrote:
> On 8/2/2011 7:25 AM, Christian Lohmaier wrote:
> On Thu, Jul 28, 2011 at 8:42 PM, Jens-Heiner Rechtien <jh...@web.de>
> wrote:
> On 07/28/2011 04:32 PM, Pedro F. Giffuni wrote:
> --- On Thu, 7/28/11, Christian Lohmaier wrote:
>
> The script to do the mapping is attached [...]
>
> unfortunately attachments don't make it through the listserver.

Sorry, It at lets patches and attached signatures pass, so I was just
hoping a text-attachment would make it as well...

> can you list the steps you are following in more detail (even a dump of your
> term history) and upload the scripts to svn?

I won't upload to svn, as I'm not a committer (and have no intentions
to sign the iCLA)

But I pasted it here
http://pastie.org/2310454

> I'll try to set up a conversion and dumps closer to the source to see how we
> can speed up this process.
> Andrew

So next step here is to create a svn dump, pass it through
svndumpfilter to only keep trunk and populate a new repo with that,
then attempt the hg conversion.
If that fails because svn is "ahead" of mercurial, do the same but
only dump up to the last matched version. That way you still will have
the time-benefit but not all svn's history.

ciao
Christian

Re: Converting the repo using mercurial's convert extension

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Andrew Rist wrote on Tue, Aug 02, 2011 at 10:27:40 -0700:
> On 8/2/2011 7:25 AM, Christian Lohmaier wrote:
> >The script to do the mapping is attached
> unfortunately attachments don't make it through the listserver.

IIRC, the list only blocks some kinds of attachments.

And that can be disabled by a request to infra --- we did that in
dev@subversion.  See INFRA-3724

Re: Converting the repo using mercurial's convert extension

Posted by Andrew Rist <an...@oracle.com>.


On 8/2/2011 7:25 AM, Christian Lohmaier wrote:
> Hi Heiner, *,
>
> On Thu, Jul 28, 2011 at 8:42 PM, Jens-Heiner Rechtien<jh...@web.de>  wrote:
>> On 07/28/2011 04:32 PM, Pedro F. Giffuni wrote:
>>> --- On Thu, 7/28/11, Christian Lohmaier wrote:
>>> ...
>>>> [1] Note that with the map, it would also be possible to
>>>> reuse the old OOo-Subversion repo for the linear commits,
>>>> after all the hg repo was a conversion from the svn server.
>>>> This would save quite a bit of time.
>>> I like this idea ... if the old SVN server is still available
>>> and we can do a progressive conversion of the rest of the Hg
>>> stuff we will save a lot of metadata that had previously been
>>> lost (plus we save conversion time).
>> It's still available: http://svn.services.openoffice.org/ooo resp.
>> svn://svn.services.openoffice.org/ooo
>>
>> You can use svnsync to create a local copy of the rep. Will take a while :-)
> Yes, takes almost two days - would have been easier if there was a dump :-)
>
> Now good thing: mapping the revisions works, and is completed on a
> slow machine in 10 minutes.
> Bad thing: I couldn't test the import with the plain repo, as hg
> convert wants a *full* checkout of the repo, not just trunk, and that
> doesn't fit in the 100GB I have available[1], so I'm now creating a
> dump to run through svndumpfilter to only preserve trunk and retry
> with that shrunk repo. (
>
> The script to do the mapping is attached
unfortunately attachments don't make it through the listserver.
can you list the steps you are following in more detail (even a dump of 
your term history) and upload the scripts to svn?
I'll try to set up a conversion and dumps closer to the source to see 
how we can speed up this process.
Andrew
> , it will create the mapping
> that can be used as hg-shamap to kickstart the conversing skipping the
> first 263205 revision, thus saving way more than 20 days of conversion
> time :-)
>
> So while the process won't work with the pristine copy of the pristine
> svn copy, it can still be used as basis for the conversion when
> filtered to only include trunk, as all the linear revisions are
> matched in trunk.
>
> [1] the svn repo contains 984 cws - and when you assume just 1 GB per
> cws for simplicity, you would need 1TB of storage to just do the
> conversion. and svn needs loads of RAM to perform the checkout - the
> 1GB of real RAM and 1GB of RAM was all occupied by svn, and after
> filling the 100GB a mere 12MB were free, so even with enough storage
> capacity, the amount of RAM probably would not have been sufficient
> for a full checkout...
>
> ciao
> Christian

Re: Converting the repo using mercurial's convert extension

Posted by "Pedro F. Giffuni" <gi...@tutopia.com>.
Hello guys;

I looked a bit at the Mercurial conversion tool and maybe
I'm reading something wrong but there is a --filemap
option where one can exclude directories from the conversion.

The tool does seem to support branches though so perhaps
it is worth it to spend 1TB of space and wait.

cheers,

Pedro.

--- On Tue, 8/2/11, Christian Lohmaier <cl...@openoffice.org> wrote:
...
> Hi Heiner, *,
> 
> On Thu, Jul 28, 2011 at 8:42 PM, Jens-Heiner Rechtien
> <jh...@web.de>
> wrote:
> > On 07/28/2011 04:32 PM, Pedro F. Giffuni wrote:
> >> --- On Thu, 7/28/11, Christian Lohmaier wrote:
> >> ...
> >>>
> >>> [1] Note that with the map, it would also be
> possible to
> >>> reuse the old OOo-Subversion repo for the
> linear commits,
> >>> after all the hg repo was a conversion from
> the svn server.
> >>> This would save quite a bit of time.
> >>
> >> I like this idea ... if the old SVN server is
> still available
> >> and we can do a progressive conversion of the rest
> of the Hg
> >> stuff we will save a lot of metadata that had
> previously been
> >> lost (plus we save conversion time).
> >
> > It's still available: http://svn.services.openoffice.org/ooo resp.
> > svn://svn.services.openoffice.org/ooo
> >
> > You can use svnsync to create a local copy of the rep.
> Will take a while :-)
> 
> Yes, takes almost two days - would have been easier if
> there was a dump :-)
> 
> Now good thing: mapping the revisions works, and is
> completed on a
> slow machine in 10 minutes.
> Bad thing: I couldn't test the import with the plain repo,
> as hg
> convert wants a *full* checkout of the repo, not just
> trunk, and that
> doesn't fit in the 100GB I have available[1], so I'm now
> creating a
> dump to run through svndumpfilter to only preserve trunk
> and retry
> with that shrunk repo. (
> 
> The script to do the mapping is attached, it will create
> the mapping
> that can be used as hg-shamap to kickstart the conversing
> skipping the
> first 263205 revision, thus saving way more than 20 days of
> conversion
> time :-)
> 
> So while the process won't work with the pristine copy of
> the pristine
> svn copy, it can still be used as basis for the conversion
> when
> filtered to only include trunk, as all the linear revisions
> are
> matched in trunk.
> 
> [1] the svn repo contains 984 cws - and when you assume
> just 1 GB per
> cws for simplicity, you would need 1TB of storage to just
> do the
> conversion. and svn needs loads of RAM to perform the
> checkout - the
> 1GB of real RAM and 1GB of RAM was all occupied by svn, and
> after
> filling the 100GB a mere 12MB were free, so even with
> enough storage
> capacity, the amount of RAM probably would not have been
> sufficient
> for a full checkout...
> 
> ciao
> Christian
> 

Re: Converting the repo using mercurial's convert extension

Posted by Christian Lohmaier <cl...@openoffice.org>.
Hi Heiner, *,

On Thu, Jul 28, 2011 at 8:42 PM, Jens-Heiner Rechtien <jh...@web.de> wrote:
> On 07/28/2011 04:32 PM, Pedro F. Giffuni wrote:
>> --- On Thu, 7/28/11, Christian Lohmaier wrote:
>> ...
>>>
>>> [1] Note that with the map, it would also be possible to
>>> reuse the old OOo-Subversion repo for the linear commits,
>>> after all the hg repo was a conversion from the svn server.
>>> This would save quite a bit of time.
>>
>> I like this idea ... if the old SVN server is still available
>> and we can do a progressive conversion of the rest of the Hg
>> stuff we will save a lot of metadata that had previously been
>> lost (plus we save conversion time).
>
> It's still available: http://svn.services.openoffice.org/ooo resp.
> svn://svn.services.openoffice.org/ooo
>
> You can use svnsync to create a local copy of the rep. Will take a while :-)

Yes, takes almost two days - would have been easier if there was a dump :-)

Now good thing: mapping the revisions works, and is completed on a
slow machine in 10 minutes.
Bad thing: I couldn't test the import with the plain repo, as hg
convert wants a *full* checkout of the repo, not just trunk, and that
doesn't fit in the 100GB I have available[1], so I'm now creating a
dump to run through svndumpfilter to only preserve trunk and retry
with that shrunk repo. (

The script to do the mapping is attached, it will create the mapping
that can be used as hg-shamap to kickstart the conversing skipping the
first 263205 revision, thus saving way more than 20 days of conversion
time :-)

So while the process won't work with the pristine copy of the pristine
svn copy, it can still be used as basis for the conversion when
filtered to only include trunk, as all the linear revisions are
matched in trunk.

[1] the svn repo contains 984 cws - and when you assume just 1 GB per
cws for simplicity, you would need 1TB of storage to just do the
conversion. and svn needs loads of RAM to perform the checkout - the
1GB of real RAM and 1GB of RAM was all occupied by svn, and after
filling the 100GB a mere 12MB were free, so even with enough storage
capacity, the amount of RAM probably would not have been sufficient
for a full checkout...

ciao
Christian

Re: Converting the repo using mercurial's convert extension

Posted by Herbert Duerr <hd...@alice.de>.
On 07/28/2011 08:42 PM, Jens-Heiner Rechtien wrote:
> But, frankly, I can't see the need of having the CVS stuff at hand. It's
> very hard to make sense of this historical data anyway, at least if you
> haven't got a decade of OOo developer knowledge under the belt.
>
> It's true that the conversion was lossy, but that was intentional! You
> wouldn't believe how much cruft can accumulate in a decade of happy
> coding. A full conversion of our old CVS repository into SVN resulted in
> a SVN repository of about 90 GiB in size.

I know many developers only care about tip and maybe the head of 
branches. It is what matters most and so people were able to develop 
code long before VCS were available.

I disagree you need to have a decade of OOo developer knowledge to make 
use of it. Quite the opposite indeed! If someone new wants to work on 
some piece of code and isn't sure why it was coded that way then it is 
extremely helpful to look at its commit comments and especially the 
issue numbers mentioned there. The info in the issue and the attached 
documents often show corner use cases that better be handled properly 
even when the code is to be refactored.

Have a look at e.g.
http://hg.services.openoffice.org/DEV300/shortlog/19d852424fb4
or
http://hg.services.openoffice.org/DEV300/shortlog/a70e5539c48b

I don't believe for a second that someone who is interested in the use 
cases and history of some piece of some code will be happy when he finds 
a "CWS-TOOLING: integrate CWS vcl92" and no way to find out what the 
original commit comments were.

Herbert

Re: Converting the repo using mercurial's convert extension

Posted by "Pedro F. Giffuni" <gi...@tutopia.com>.

--- On Thu, 7/28/11, Jens-Heiner Rechtien wrote:

> On 07/28/2011 04:32 PM, Pedro F. Giffuni wrote:
> >
> > --- On Thu, 7/28/11, Christian Lohmaier wrote:
> > ...
> >>
> >> [1] Note that with the map, it would also be
> >> possible to reuse the old OOo-Subversion repo
> >> for the linear commits, after all the hg repo
> >>  repo was a conversion from the server.
> >> This would save quite a bit of time.
> >>
> >
> > I like this idea ... if the old SVN server is
> > still available and we can do a progressive
> > conversion of the rest of the Hg
> > stuff we will save a lot of metadata that had
> > previously been lost (plus we save conversion
> > time).
> 
> It's still available: http://svn.services.openoffice.org/ooo 
> resp. svn://svn.services.openoffice.org/ooo
>

Very, very nice.. it even has some old branches, perhaps those
can be updated progressively too at a later time! FWIW, it's
running Subversion 1.5.4. 

There are also some extra tools: "The CWS tooling has been
reworked to adapt to SVN.".


> You can use svnsync to create a local copy of the rep. Will
> take a while :-)
>

Ugh... I don't have the space locally to do even this :(,
but it certainly looks like we can save a lot of stuff
(and time) from there.
 
> >
> > I suspect the original CVS stuff that was lost in SVN
> > conversion is gone for good by now.
> 
> Oh, I'm pretty sure that it's still available somewhere.
> There have been some CVSup mirrors of the stuff.
> 
> But, frankly, I can't see the need of having the CVS stuff
> at hand. It's very hard to make sense of this historical
> data anyway, at least if you haven't got a decade of OOo
> developer knowledge under the belt.
>

Yes, I think the SVN conversion already has what could be
recovered easily, I doubt a newer conversion tool can add
much to it.

Thanks for the link. I hope we end up going this way!

Pedro.

Re: Converting the repo using mercurial's convert extension

Posted by Jens-Heiner Rechtien <jh...@web.de>.
On 07/28/2011 04:32 PM, Pedro F. Giffuni wrote:
>
> --- On Thu, 7/28/11, Christian Lohmaier wrote:
> ...
>>
>> [1] Note that with the map, it would also be possible to
>> reuse the old OOo-Subversion repo for the linear commits,
>> after all the hg repo was a conversion from the svn server.
>> This would save quite a bit of time.
>>
>
> I like this idea ... if the old SVN server is still available
> and we can do a progressive conversion of the rest of the Hg
> stuff we will save a lot of metadata that had previously been
> lost (plus we save conversion time).

It's still available: http://svn.services.openoffice.org/ooo resp. 
svn://svn.services.openoffice.org/ooo

You can use svnsync to create a local copy of the rep. Will take a while :-)

>
> I suspect the original CVS stuff that was lost in SVN conversion
> is gone for good by now.

Oh, I'm pretty sure that it's still available somewhere. There have been 
some CVSup mirrors of the stuff.

But, frankly, I can't see the need of having the CVS stuff at hand. It's 
very hard to make sense of this historical data anyway, at least if you 
haven't got a decade of OOo developer knowledge under the belt.

It's true that the conversion was lossy, but that was intentional! You 
wouldn't believe how much cruft can accumulate in a decade of happy 
coding. A full conversion of our old CVS repository into SVN resulted in 
a SVN repository of about 90 GiB in size.

>
> Also, having the complete repository (with branches) in google
> code sounds like a good approach.
>
> Pedro.
>
>

Heiner


-- 
Jens-Heiner Rechtien

Re: Converting the repo using mercurial's convert extension (was: OOO340 to svn)

Posted by "Pedro F. Giffuni" <gi...@tutopia.com>.
--- On Thu, 7/28/11, Christian Lohmaier wrote:
...
> 
> [1] Note that with the map, it would also be possible to
> reuse the old OOo-Subversion repo for the linear commits,
> after all the hg repo was a conversion from the svn server.
> This would save quite a bit of time.
>

I like this idea ... if the old SVN server is still available
and we can do a progressive conversion of the rest of the Hg
stuff we will save a lot of metadata that had previously been
lost (plus we save conversion time).

I suspect the original CVS stuff that was lost in SVN conversion
is gone for good by now.

Also, having the complete repository (with branches) in google
code sounds like a good approach.

Pedro.