You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Dustin Sallings <du...@spy.net> on 2011/08/08 20:57:33 UTC

git conversion (was: Re: dev Digest of: thread.17379)

	[sorry for screwing up the subject]

On Aug 8, 2011, at 11:22 AM, Paul Davis wrote:

> Nice! Any hints you have about validating SVN->Git conversions or
> tooling would be greatly appreciated. I don't really have much other
> than the obvious Graphviz plotting tool. Beyond that I don't have
> anything other than getting each TLP to verify their own history.

	I wrote a tool that would take two git trees that had no common history, but were expected to converge on the same tree state and show a graphical diff as part of the memcached conversion.  It produces output that looks like this:

		http://public.west.spy.net/memcached/compare.html

	As it is, it won't help you much if you're planning to move source around *during* the conversion, but it does a good job of verifying that you didn't break anything in rebasing, updating commit messages, changing authors, committers, etc...

	Basically, you just need two refs in a single git repo (old-branch vs. rewritten-stuff) and run "git tree-converge old-branch rewritten-stuff" and you get tons of html spewed at you.

	https://github.com/dustin/bindir/blob/master/git-tree-converge

> I'm also not sure if it makes a difference, but the ASF SVN repo is
> one huge monolithic thing, so it's a lot of project histories
> intertwined which I'm looking forward to finding awesome conversion
> bugs with.


	The biggest problem I've had with such things is actually having svn be willing to give up the history.  As long as you can get it out in any way at all, we can fix it.  The worst case would be doing a complete reproduction of the svn history in a monolithic git repo.  I can work with that.  It's likely unnecessary.

	My experience with svn has never been good and I wouldn't call myself an expert there, but if we can get the content out successfully, I can help you do all kinds of junk with it.

-- 
dustin sallings




Re: git conversion (was: Re: dev Digest of: thread.17379)

Posted by Dustin Sallings <du...@spy.net>.
On Aug 8, 2011, at 12:24 PM, Paul Davis wrote:

> Oh, totally forgot to mention, I'm not entirely certain what that
> graph is supposed to be showing. Basically that the set of commits on
> the left is equivalent to that on the right?


	It's tree states.  Regardless of the commits themselves, whenever there's white, everything is in sync. Red on the left is a change that does not appear on the right (and vice versa for green right).

	If there's a bunch of white and then a red, that means that that change made the codebase diverge.  When it goes white again, they got back in sync.

	As far as verifying the end-user commits... I'm not exactly sure what I'd want to see as a user looking at my stuff in isolation.  What would you want to see?

-- 
dustin sallings




Re: git conversion (was: Re: dev Digest of: thread.17379)

Posted by Paul Davis <pa...@gmail.com>.
On Mon, Aug 8, 2011 at 2:23 PM, Paul Davis <pa...@gmail.com> wrote:
> On Mon, Aug 8, 2011 at 1:57 PM, Dustin Sallings <du...@spy.net> wrote:
>>
>>        [sorry for screwing up the subject]
>>
>> On Aug 8, 2011, at 11:22 AM, Paul Davis wrote:
>>
>>> Nice! Any hints you have about validating SVN->Git conversions or
>>> tooling would be greatly appreciated. I don't really have much other
>>> than the obvious Graphviz plotting tool. Beyond that I don't have
>>> anything other than getting each TLP to verify their own history.
>>
>>        I wrote a tool that would take two git trees that had no common history, but were expected to converge on the same tree state and show a graphical diff as part of the memcached conversion.  It produces output that looks like this:
>>
>>                http://public.west.spy.net/memcached/compare.html
>>
>>        As it is, it won't help you much if you're planning to move source around *during* the conversion, but it does a good job of verifying that you didn't break anything in rebasing, updating commit messages, changing authors, committers, etc...
>>
>>        Basically, you just need two refs in a single git repo (old-branch vs. rewritten-stuff) and run "git tree-converge old-branch rewritten-stuff" and you get tons of html spewed at you.
>>
>>        https://github.com/dustin/bindir/blob/master/git-tree-converge
>>
>>> I'm also not sure if it makes a difference, but the ASF SVN repo is
>>> one huge monolithic thing, so it's a lot of project histories
>>> intertwined which I'm looking forward to finding awesome conversion
>>> bugs with.
>>
>>
>>        The biggest problem I've had with such things is actually having svn be willing to give up the history.  As long as you can get it out in any way at all, we can fix it.  The worst case would be doing a complete reproduction of the svn history in a monolithic git repo.  I can work with that.  It's likely unnecessary.
>>
>>        My experience with svn has never been good and I wouldn't call myself an expert there, but if we can get the content out successfully, I can help you do all kinds of junk with it.
>>
>> --
>> dustin sallings
>>
>>
>>
>>
>
> I'm mostly looking for tools to let people look at a Git history and
> verify that it matches their SVN history. CouchDB's SVN to Git
> migration is basically a test case for all of the ASF. Assuming it
> goes well there will be other projects wanting to switch so I'm trying
> to think ahead to what they might want to see.
>
> Its always possible to get the data out, but the thing to realize is
> that the ASF SVN repo is over 65GiB with over 1.1M commits. Brute
> forcing the conversion is probably not the most sane approach if it
> can be avoided.
>
> Thanks for the input.
>

Oh, totally forgot to mention, I'm not entirely certain what that
graph is supposed to be showing. Basically that the set of commits on
the left is equivalent to that on the right?

Re: git conversion (was: Re: dev Digest of: thread.17379)

Posted by Paul Davis <pa...@gmail.com>.
On Mon, Aug 8, 2011 at 1:57 PM, Dustin Sallings <du...@spy.net> wrote:
>
>        [sorry for screwing up the subject]
>
> On Aug 8, 2011, at 11:22 AM, Paul Davis wrote:
>
>> Nice! Any hints you have about validating SVN->Git conversions or
>> tooling would be greatly appreciated. I don't really have much other
>> than the obvious Graphviz plotting tool. Beyond that I don't have
>> anything other than getting each TLP to verify their own history.
>
>        I wrote a tool that would take two git trees that had no common history, but were expected to converge on the same tree state and show a graphical diff as part of the memcached conversion.  It produces output that looks like this:
>
>                http://public.west.spy.net/memcached/compare.html
>
>        As it is, it won't help you much if you're planning to move source around *during* the conversion, but it does a good job of verifying that you didn't break anything in rebasing, updating commit messages, changing authors, committers, etc...
>
>        Basically, you just need two refs in a single git repo (old-branch vs. rewritten-stuff) and run "git tree-converge old-branch rewritten-stuff" and you get tons of html spewed at you.
>
>        https://github.com/dustin/bindir/blob/master/git-tree-converge
>
>> I'm also not sure if it makes a difference, but the ASF SVN repo is
>> one huge monolithic thing, so it's a lot of project histories
>> intertwined which I'm looking forward to finding awesome conversion
>> bugs with.
>
>
>        The biggest problem I've had with such things is actually having svn be willing to give up the history.  As long as you can get it out in any way at all, we can fix it.  The worst case would be doing a complete reproduction of the svn history in a monolithic git repo.  I can work with that.  It's likely unnecessary.
>
>        My experience with svn has never been good and I wouldn't call myself an expert there, but if we can get the content out successfully, I can help you do all kinds of junk with it.
>
> --
> dustin sallings
>
>
>
>

I'm mostly looking for tools to let people look at a Git history and
verify that it matches their SVN history. CouchDB's SVN to Git
migration is basically a test case for all of the ASF. Assuming it
goes well there will be other projects wanting to switch so I'm trying
to think ahead to what they might want to see.

Its always possible to get the data out, but the thing to realize is
that the ASF SVN repo is over 65GiB with over 1.1M commits. Brute
forcing the conversion is probably not the most sane approach if it
can be avoided.

Thanks for the input.